Help:What is a Dataset: Difference between revisions

From Ruisdael Observatory Data Catalog
No edit summary
 
(19 intermediate revisions by the same user not shown)
Line 1: Line 1:
The catalog lists many different types of data.  
The catalog can be used to index many different types of datasets. The text below gives important information about the main types of datasets that exist, and some guidelines for what shouldn't be indexed.  


= Acceptable Data Types =
* '''Model outputs''': The output of numerical weather prediction models, such as DALES, HARMONIE and other types of numerical simulations, reanalyses and forecasts. This category partially overlaps with the "Derived/Processed data" category.
* '''In-situ observations:''' these are datasets collected directly at the location of interest. For example from rain gauges, anemometers, spectrometers etc..
* '''Remote-sensing observations:''' These are datasets collected from a distance, typically using ground-based remote sensors or satellites.
* '''Derived/Processed Data:''' these are datasets that were obtained by processing, merging or blending together different in-situ and/or remote sensing data. For example physical retrievals, model reanalyses and higher-level datasets based on other datasets. 
* '''Geospatial Data:''' datasets that include spatial information about the Earth's surface, sub-surface and atmosphere. For example GIS layers, coordinate systems, grids, projections etc..


 
= What should not be indexed? =
 
* '''Software''': numerical weather models, toolboxes, etc. should not be indexed. However, the data that they produce can be indexed.
= Types =
* '''Data analysis scripts''': For example, Python notebooks, Matlab, R or C scripts should not be indexed. The latter can be documented and distributed through the Ruisdael github page.
* model output
* '''Articles''': peer-reviewed papers, technical reports, presentations and other forms of written/oral publications should not be indexed in the catalog. Please use links to refer to them.
* in-situ observations
* '''datasets that do not have any link to the Ruisdael project.'''
* remote-sensing observations
* Derived/Processed Data: for example, highly processed datasets, blended satellite and in-situ data, retrievals and datasets based on other datasets.
* Geospatial Data: for example, datasets that include spatial information about the Earth's surface, sub-surface and atmosphere. GIS layers etc..
 
 
    Historical and Paleoclimate Data (Datasets that provide information about past climates using proxy data or historical records.)
    In-situ Observation (Datasets collected directly at the location of interest.)
    Model Data (Datasets produced by computational models.)
    Remote Sensing Observation (Datasets collected from a distance, typically using satellites or ground-based remote sensors.)
 
Software should not be part of the catalog and should be shared via the Ruisdael github. But I agree with you that we should provide some better guidelines as to what counts as a dataset. I'll put this on the agenda for the first core group meeting in January/February 2025. For now, if you are not sure, just ask me 🙂
 
 
= Not a Dataset =
* software and data analysis scripts. The latter should be indexed, documented and distributed through the Ruisdael github page.
* datasets that do not have any link to the Ruisdael project.

Latest revision as of 16:29, 3 April 2025

The catalog can be used to index many different types of datasets. The text below gives important information about the main types of datasets that exist, and some guidelines for what shouldn't be indexed.

Acceptable Data Types

  • Model outputs: The output of numerical weather prediction models, such as DALES, HARMONIE and other types of numerical simulations, reanalyses and forecasts. This category partially overlaps with the "Derived/Processed data" category.
  • In-situ observations: these are datasets collected directly at the location of interest. For example from rain gauges, anemometers, spectrometers etc..
  • Remote-sensing observations: These are datasets collected from a distance, typically using ground-based remote sensors or satellites.
  • Derived/Processed Data: these are datasets that were obtained by processing, merging or blending together different in-situ and/or remote sensing data. For example physical retrievals, model reanalyses and higher-level datasets based on other datasets.
  • Geospatial Data: datasets that include spatial information about the Earth's surface, sub-surface and atmosphere. For example GIS layers, coordinate systems, grids, projections etc..

What should not be indexed?

  • Software: numerical weather models, toolboxes, etc. should not be indexed. However, the data that they produce can be indexed.
  • Data analysis scripts: For example, Python notebooks, Matlab, R or C scripts should not be indexed. The latter can be documented and distributed through the Ruisdael github page.
  • Articles: peer-reviewed papers, technical reports, presentations and other forms of written/oral publications should not be indexed in the catalog. Please use links to refer to them.
  • datasets that do not have any link to the Ruisdael project.