When: 18 – 22 October, 2021
Credit: 1 ECTS
Course lecturers: Øystein Godøy/MET , Torill Hamre/NERSC, Lara Ferrighi/MET, Markus Fiebig/NILU
Course responsible: Joachim Reuder/UiB, Joe LaCasce/UiO
(Registration deadline: 12 October)
This course will provide an introduction to the FAIR guiding principles for data management, their specific implementations within geoscience and practical exercises. Practical steps towards Findable, Accessible, Interoperable and Reusable data are discussed and exercised emphasising the data provider and data consumer perspectives. Practical introductions to the various elements of the FAIR guiding principles are related to concepts of discovery metadata, use metadata, persistent identifiers (e.g. Digital Object Identifiers) and how they help traceability of decisions (e.g. through scientific citation of data), containers for data (e.g. NetCDF), semantics for geoscientific data (glossaries, thesaurus, taxonomies and ontologies) in a interdisciplinary context and related to terminology as a mechanism for scientific collaboration, tools for generating FAIR data (e.g. how to work with Rosetta and other tools for converting data, how to use Python), how to work with FAIR data, how to publish data with the help of data centres, how to publish data with the help of schema.org (focusing on discoverability by Google), national structures that facilitate data sharing (e.g. Norwegian Marine Data Centre, Norwegian Scientific Data Network, Norwegian Infrastructure for Research Data), how these are connected and how to work with Data Management Plans that are/or will be required by funding agencies and resources providers (e.g. UNINETT Sigma2).
Practical work will be based on students bringing their own data, evaluation of their FAIRness and how to improve FAIRness for these using Rosetta and Python to create NetCDF according to the Climate and Forecast (CF) Convention with Attribute Convention for Dataset Discovery (ACDD) embedded.
At the end of the course, students will know the FAIR guiding principles, best practises of FAIR data within geoscience and practical approaches to achieving FAIR data using Rosetta and Python as well as how to work with data management plans for their future career.
The course will be held online (Zoom) with the link provided to participants in time before the course. The first day will be a full day (6 hours) of lectures, introducing different concepts, as well as introducing the first assignment, on data curation. The 2nd day will be dedicated to self study where students work on curating their own dataset. Lecturers will be available by Zoom (open room outside lecture hours) and a dedicated Slack channel through the full week to support students. On the 3rd day, students will present results of the 1st assignment, followed by lectures and an introduction to the 2nd assignment (on data exploitation). After another day of self-study on the assignment (day 4), day 5 concludes with student presentations of the 2nd assignment, further lectures, and a course wrap-up.
A more detailed outline of the lectures is provided below. Students are required to describe and upload the dataset they will work with one week in advance of the course start.
Lecture Day 1, Monday
09:15-10:15: Motivation: Why do we need data management?
- Why do we need data management?
- Data Sharing and Management Snafu in 3 Short Acts
- Science life cycle/Data life cycle
- How to change data sharing culture.
- What are the FAIR data principles?
- How do they help with good data management?
- External boundary conditions by funding agencies and publishers, scientific data as service.
- Data management plan.
10:30-11:45: The basics: data and metadata
- What are data? What are metadata?
- Discovery, site, and use metadata.
- What is provenance?
- Plan your experiment. Which data and metadata do you need to record?
- How to record various types of metadata.
- Metadata templates (Arven etter Nansen, EBAS)
- Gap handling for metadata (missing elements).
11:45-12:30: Lunch break
12:30-13:45: Data structure/formatting
- NetCDF/CF grid, trajectory, profile, timeseries
- Standard names, vocabularies
- Granularity requirements
14:00-15:30: Documentation of data
- Tools for documenting data
- Rosetta (web application), NCO/CDO (command line), Python (netcdf4/xarray), R
- More detail on Python
- Validation tools for NetCDF-CF.
- What is actually validated?
- NorDataNet validator, PUMA validator
- Rosetta in more detail
- Profiles, time series, trajectory
- Template concept, benefits for processing multiple datasets, possibilities for collaboration (e.g. place template files in GitHub)
- Examples of e.g. CTD profile from Seabird sensor
- Introduction of assigment
Lecture Day 2, Wednesday
09:00-10:15: Presentation of assignment results and feedback
10:30-11:45: Publishing your data
- Mandated and long term archives
- Data publications
- PID (Explicit mention DOI)
- Data policies / Licensing
- Tracking usage (using DOI)
- NorDataNet (distributed network of data centres)
- NIRD RDA
- GAW repositories
- Repositories for model data
11:45-12:30: Lunch break
12:30-13:45:How to exploit / process further / consume data
- Interfaces to data
- Examples of benefits when using truly interoperable data.
- Interfaces: WMS, OGC API, OpenAPI, OPeNDAP, RESTful (Restful in general)
- Integration in tools e.g.:
14:00-15:30:Intro to Workshop: Analysing data (Torill, Øystein)
Lecture Day 3, Friday
09:00-10:15: Presentation of assignment results and feedback, part1
10:30-11:45: Presentation of assignment results and feedback, part2
11:45-12:30: Lunch break
12:30-13:45: Data sharing ethics & culture, and how NorDataNet services help. (Øystein)
- Data sharing ethics, certainly before publishing
- Data Life Cycle and its relation to the scientific workflow, revisited from a scientists point of view
- Data sharing in a cultural perspective and relations to the scientific workflow
- NorDataNet service overview
14:00-15:30: Student summary of the course, what has been useful (and not?).