An Introduction to FAIR Data Management for Geoscientists

When: 18 – 22 October, 2021
Where: Online
Credit: 1 ECTS
Course lecturers: Øystein Godøy/MET , Torill Hamre/NERSC, Lara Ferrighi/MET, Markus Fiebig/NILU
Course responsible: Joachim Reuder/UiB, Joe LaCasce/UiO

Registration

(Registration deadline: 12 October) registration closed

Course description:

This course will provide an introduction to the FAIR guiding principles for data management, their specific implementations within geoscience and practical exercises. Practical steps towards Findable, Accessible, Interoperable and Reusable data are discussed and exercised emphasising the data provider and data consumer perspectives. Practical introductions to the various elements of the FAIR guiding principles are related to concepts of discovery metadata, use metadata, persistent identifiers (e.g. Digital Object Identifiers) and how they help traceability of decisions (e.g. through scientific citation of data), containers for data (e.g. NetCDF), semantics for geoscientific data (glossaries, thesaurus, taxonomies and ontologies) in a interdisciplinary context and related to terminology as a mechanism for scientific collaboration, tools for generating FAIR data (e.g. how to work with Rosetta and other tools for converting data, how to use Python), how to work with FAIR data, how to publish data with the help of data centres, how to publish data with the help of schema.org (focusing on discoverability by Google), national structures that facilitate data sharing (e.g. Norwegian Marine Data Centre, Norwegian Scientific Data Network, Norwegian Infrastructure for Research Data), how these are connected and how to work with Data Management Plans that are/or will be required by funding agencies and resources providers (e.g. UNINETT Sigma2).

Practical work will be based on students bringing their own data, evaluation of their FAIRness and how to improve FAIRness for these using Rosetta and Python to create NetCDF according to the Climate and Forecast (CF) Convention with Attribute Convention for Dataset Discovery (ACDD) embedded.

Outcomes:

At the end of the course, students will know the FAIR guiding principles, best practises of FAIR data within geoscience and practical approaches to achieving FAIR data using Rosetta and Python as well as how to work with data management plans for their future career.

Learning modules/structure:

The course will be held online (Zoom) with the link provided to participants in time before the course. The first day will be a full day (6 hours) of lectures, introducing different concepts, as well as introducing the first assignment, on data curation. The 2nd day will be dedicated to self study where students work on curating their own dataset. Lecturers will be available by Zoom (open room outside lecture hours) and a dedicated Slack channel through the full week to support students. On the 3rd day, students will present results of the 1st assignment, followed by lectures and an introduction to the 2nd assignment (on data exploitation). After another day of self-study on the assignment (day 4), day 5 concludes with student presentations of the 2nd assignment, further lectures, and a course wrap-up.
A more detailed outline of the lectures is provided below. Students are required to describe and upload the dataset they will work with one week in advance of the course start.

Lecture program:

Lecture Day 1, Monday

09:00-09:15 Introduction

09:15-10:15: Motivation: Why do we need data management?

Why do we need data management?
Data Sharing and Management Snafu in 3 Short Acts
https://www.youtube.com/watch?v=N2zK3sAtr-4
Science life cycle/Data life cycle
How to change data sharing culture.
What are the FAIR data principles?
How do they help with good data management?
External boundary conditions by funding agencies and publishers, scientific data as service.
Data management plan.

10:15-10:30: Break

10:30-11:45: The basics: data and metadata

What are data? What are metadata?
Discovery, site, and use metadata.
What is provenance?
Plan your experiment. Which data and metadata do you need to record?
How to record various types of metadata.
Metadata templates (Arven etter Nansen, EBAS)
Gap handling for metadata (missing elements).

11:45-12:30: Lunch break

12:30-13:45: Data structure/formatting

NetCDF/CF grid, trajectory, profile, timeseries
Standard names, vocabularies
Granularity requirements

13:45-14:00: Break

14:00-15:30: Documentation of data

Tools for documenting data
Rosetta (web application), NCO/CDO (command line), Python (netcdf4/xarray), R
More detail on Python
Validation tools for NetCDF-CF.
What is actually validated?
NorDataNet validator, PUMA validator
Rosetta in more detail
Profiles, time series, trajectory
Template concept, benefits for processing multiple datasets, possibilities for collaboration (e.g. place template files in GitHub)
Examples of e.g. CTD profile from Seabird sensor
Introduction of assigment

Lecture Day 2, Wednesday

09:00-10:15: Presentation of assignment results and feedback

10:15-10:30: Break

10:30-11:45: Publishing your data

Mandated and long term archives
Data publications
PID (Explicit mention DOI)
Data policies / Licensing
Tracking usage (using DOI)
Repositories:
NorDataNet (distributed network of data centres)
NIRD RDA
GAW repositories
Repositories for model data
Figshare

11:45-12:30: Lunch break

12:30-13:45:How to exploit / process further / consume data

Interfaces to data
Examples of benefits when using truly interoperable data.
Interfaces: WMS, OGC API, OpenAPI, OPeNDAP, RESTful (Restful in general)
Integration in tools e.g.:
Python
R
Jupyter

13:45-14:00: Break

14:00-15:30:Intro to Workshop: Analysing data (Torill, Øystein)

Lecture Day 3, Friday

09:00-10:15: Presentation of assignment results and feedback, part1

10:15-10:30: Break

10:30-11:45: Presentation of assignment results and feedback, part2

11:45-12:30: Lunch break

12:30-13:45: Data sharing ethics & culture, and how NorDataNet services help. (Øystein)

Data sharing ethics, certainly before publishing
Data Life Cycle and its relation to the scientific workflow, revisited from a scientists point of view
Data sharing in a cultural perspective and relations to the scientific workflow
NorDataNet service overview

13:45-14:00: Break

14:00-15:30: Student summary of the course, what has been useful (and not?).