Learning Data Ethics for Open Data Sharing

Curation Workflows and Checklists

Curation Workflows

Curation is the process involved in preparing data for sharing and preservation/archiving. It tends to review operationality, understandability, and discoverability of data. For example, when curating, you may check that files are not corrupt and are in suitably accessible formats. This is often a workflow step left to the last minute—do your best to not let it be! By the way, curation is not the same as verification, which can be a different set of steps, related to scientific quality and file output accuracy, and could involve actually rerunning all the experiments done in the study. Here’s a lightning talk (watch 13:12 – 20:21; alternatively, see video below) about the difference between these curation and verification.

Various workflows and checklists have been created to help streamline the curation process. If curation and documentation has been happening throughout the research data process, curation should go smoothly. If it has not, you will likely encounter long pauses between each check in the curation workflow list.

ICPSR is a repository that specifically has staff to do curation for researchers prior to depositing the data files in their repository. They describe in this webinar (Watch from 10:23-17:38) their curation process (alternatively, see video below), including risk disclosure (Jump to Risk Assessment and De-identification for more).

For those of us who are guiding others on curation rather than doing the curation and anonymization ourselves, Pisani et al (2019) is a case study of a research group at Crisis Text Line, a nonprofit company, that collected data from a texting hotline for people in crisis. The case study details the process this group went through to form a data ethics collaboration committee, and identify and launch a protocols model and appropriate technical solutions for ethical data sharing of this data.

Another study by O’Donnell and Brundy (2022) detail the risk assessment process as a collaborative workflow model.

Checklists

Data Curation Network created a C.U.R.A.T.E.D. checklist of steps to perform upon data to ensure it is ready for deposit. This is a significant workflow source for data curators. Included in this version are key ethical considerations for each step of the checklist.

CURATED refers to:
Check files and read documentation (risk mitigation, file inventory, appraisal/selection)
Understand the data (or try to), if not… (run files/environment, QA/QC issues, readme)
Request missing information or changes (tracking provenance of any changes and why)
Augment metadata for findability (DOIs, metadata standards, discoverability)
Transform file formats for reuse (data preservation, conversion tools, data viz)
Evaluate for FAIRness (licenses, responsibility standards, metrics for tracking use)
Document your curation activities (Curator Log, correspondence)

The Poverty Action Lab also provides a curation checklist of steps.

Besides these operational checklist steps, here are some more holistic questions to ask yourself as you are depositing datasets:

Sources:

This page has paths:

This page references: