Help:What is a Dataset: Difference between revisions

From Ruisdael Observatory Data Catalog
No edit summary
No edit summary
 
(15 intermediate revisions by the same user not shown)
Line 1: Line 1:
The catalog lists many different types of data.  
This page is under construction... please come back later




The catalog can be used to index many different types of datasets. The text below gives important information about the main types of datasets that exist, and some guidelines for what shouldn't be indexed.


= Acceptable Data Types =
* '''Model outputs''': The output of numerical weather prediction models, such as DALES, HARMONIE and other types of numerical simulations, reanalyses and forecasts. This category partially overlaps with the "Derived/Processed data" category.
* '''In-situ observations:''' these are datasets collected directly at the location of interest.
* '''Remote-sensing observations:''' (Datasets collected from a distance, typically using satellites or ground-based remote sensors.)
* '''Derived/Processed Data:''' highly processed datasets such as blended satellite and in-situ data, physical retrievals, reanalyses and higher-level datasets based on other datasets. 
* '''Geospatial Data:''' datasets that include spatial information about the Earth's surface, sub-surface and atmosphere. For example GIS layers, coordinate systems, grids, projections etc..


= Types =
= What should not be indexed? =
* model output
* '''Software''': numerical weather models, toolboxes, etc. should not be indexed.
* in-situ observations
* '''Data analysis scripts''': For example, Python notebooks, Matlab, R or C scripts should not be indexed. The latter can be documented and distributed through the Ruisdael github page.
* remote-sensing observations
* '''Articles''': peer-reviewed papers, technical reports, presentations and other forms of written/oral publications should not be indexed in the catalog.
* derived products: for example, high-level datasets that involve lots of processing, datasets based on other datasets, retrievals. 
 
 
= Not a Dataset =
* software and data analysis scripts. The latter should be indexed, documented and distributed through the Ruisdael github page.
* datasets that do not have any link to the Ruisdael project.
* datasets that do not have any link to the Ruisdael project.

Latest revision as of 14:01, 10 December 2024

This page is under construction... please come back later


The catalog can be used to index many different types of datasets. The text below gives important information about the main types of datasets that exist, and some guidelines for what shouldn't be indexed.

Acceptable Data Types

  • Model outputs: The output of numerical weather prediction models, such as DALES, HARMONIE and other types of numerical simulations, reanalyses and forecasts. This category partially overlaps with the "Derived/Processed data" category.
  • In-situ observations: these are datasets collected directly at the location of interest.
  • Remote-sensing observations: (Datasets collected from a distance, typically using satellites or ground-based remote sensors.)
  • Derived/Processed Data: highly processed datasets such as blended satellite and in-situ data, physical retrievals, reanalyses and higher-level datasets based on other datasets.
  • Geospatial Data: datasets that include spatial information about the Earth's surface, sub-surface and atmosphere. For example GIS layers, coordinate systems, grids, projections etc..

What should not be indexed?

  • Software: numerical weather models, toolboxes, etc. should not be indexed.
  • Data analysis scripts: For example, Python notebooks, Matlab, R or C scripts should not be indexed. The latter can be documented and distributed through the Ruisdael github page.
  • Articles: peer-reviewed papers, technical reports, presentations and other forms of written/oral publications should not be indexed in the catalog.
  • datasets that do not have any link to the Ruisdael project.