Learning Data Ethics for Open Data Sharing

Curation Workflows and Checklists

Curation Workflows

Curation is the process involved in preparing data for sharing and preservation/archiving. It tends to review operationality, understandability, and discoverability of data. For example, when curating, you may check that files are not corrupt and are in suitably accessible formats. This is often a workflow step left to the last minute—do your best to not let it be! By the way, curation is not the same as verification, which can be a different set of steps, related to scientific quality and file output accuracy, and could involve actually rerunning all the experiments done in the study. Here’s a lightning talk (watch 13:12 – 20:21; alternatively, see video below) about the difference between these curation and verification.

Various workflows and checklists have been created to help streamline the curation process. If curation and documentation has been happening throughout the research data process, curation should go smoothly. If it has not, you will likely encounter long pauses between each check in the curation workflow list.

ICPSR is a repository that specifically has staff to do curation for researchers prior to depositing the data files in their repository. They describe in this webinar (Watch from 10:23-17:38) their curation process (alternatively, see video below), including risk disclosure.

For those of us who are guiding others on curation rather than doing the curation and anonymization ourselves, Pisani et al (2019) is a case study of a research group at Crisis Text Line, a nonprofit company, that collected data from a texting hotline for people in crisis. The case study details the process this group went through to form a data ethics collaboration committee, and identify and launch a protocols model and appropriate technical solutions for ethical data sharing of this data.

Another study by O’Donnell and Brundy (2022) detail the risk assessment process as a collaborative workflow model.


Checklists

Data Curation Network created a C.U.R.A.T.E.D. checklist of steps to perform upon data to ensure it is ready for deposit. This is a significant workflow source for data curators. Included in this version are key ethical considerations for each step of the checklist.

CURATED refers to:
The Poverty Action Lab also provides a curation checklist of steps.

Besides these operational checklist steps, here are some more holistic questions to ask yourself as you are depositing datasets:

Sources

This page has paths:

This page references: