Learning Data Ethics for Open Data SharingMain MenuAbout This ProjectTable of ContentsIntroduction to this OER, and list of topicsIntroduction to Data EthicsWhat Constitutes as Sensitive Data?Effects of Good/Bad Data EthicsIntroduction to Data SharingWhat Could You Share?Journal and Funder MandatesFAIR Data SharingRestricted Access in FAIR SharingWhat Goes Into a Data Repository Record?Introduction to Data CurationCuration Workflows and ChecklistsIRB Applications and Data Management PlansInformed ConsentData Use AgreementsRisk Assessment and De-identificationMachine Learning and Big Data ResearchLynnee Argabright5e34677fb40215fff81dbaad4ee2c305e4977a8e
See more about Risk Assessment
12022-10-31T14:45:58-07:00Lynnee Argabright5e34677fb40215fff81dbaad4ee2c305e4977a8e401802Box for jumping between pagesplain2022-10-31T18:24:43-07:00Lynnee Argabright5e34677fb40215fff81dbaad4ee2c305e4977a8eRefer to the “Risk Assessment and Deidentification” page for a guide on removing sensitive information from your dataset.
If you haven’t already, play Level 7 in the League of Data game: https://lod.sshopencloud.eu/LodGame/ It helps you think about what information is sensitive. Source: SSHOC. (2020). Data Publication Challenge [video game]. Social Sciences and Humanities Open Cloud (SSHOC) League of Data (LOD). https://lod.sshopencloud.eu/ --------------------------
Personally identifiable information (PII) are references, or variables, in data that can disclose a person’s identity. These can be either direct identifiers about a person, such as name or social security number, or indirect references, which are characteristics like occupation or salary that when put together can make an individual unique.
Certain direct identifiers, given what sort of data you’re working with, must be protected by law. HIPAA and FERPA are federal laws regulating the collection and exchange of health and student information. These regulations specify particular direct identifiers, and certain requirements that must be followed for proper use and disclosure of them.
More examples of personal identifiers (including those that are HIPAA and FERPA specific) are listed in the “NC State IRB Guidance: Identifiable Data Sets” document on NC State's Research Administration and Compliance website.
Curation is the process involved in preparing data for sharing and preservation/archiving. It tends to review operationality, understandability, and discoverability of data. For example, when curating, you may check that files are not corrupt and are in suitably accessible formats. This is often a workflow step left to the last minute—do your best to not let it be! By the way, curation is not the same as verification, which can be a different set of steps, related to scientific quality and file output accuracy, and could involve actually rerunning all the experiments done in the study. Here’s a lightning talk (watch 13:12 – 20:21; alternatively, see video below) about the difference between these curation and verification. Various workflows and checklists have been created to help streamline the curation process. If curation and documentation has been happening throughout the research data process, curation should go smoothly. If it has not, you will likely encounter long pauses between each check in the curation workflow list.
ICPSR is a repository that specifically has staff to do curation for researchers prior to depositing the data files in their repository. They describe in this webinar (Watch from 10:23-17:38) their curation process (alternatively, see video below), including risk disclosure. For those of us who are guiding others on curation rather than doing the curation and anonymization ourselves, Pisani et al (2019) is a case study of a research group at Crisis Text Line, a nonprofit company, that collected data from a texting hotline for people in crisis. The case study details the process this group went through to form a data ethics collaboration committee, and identify and launch a protocols model and appropriate technical solutions for ethical data sharing of this data.
Another study by O’Donnell and Brundy (2022) detail the risk assessment process as a collaborative workflow model.
Checklists
Data Curation Network created a C.U.R.A.T.E.D. checklist of steps to perform upon data to ensure it is ready for deposit. This is a significant workflow source for data curators. Included in this version are key ethical considerations for each step of the checklist.
CURATED refers to:
Check files and read documentation (risk mitigation, file inventory, appraisal/selection)
Understand the data (or try to), if not… (run files/environment, QA/QC issues, readme)
Request missing information or changes (tracking provenance of any changes and why)
Augment metadata for findability (DOIs, metadata standards, discoverability)
Transform file formats for reuse (data preservation, conversion tools, data viz)
Evaluate for FAIRness (licenses, responsibility standards, metrics for tracking use)
Document your curation activities (Curator Log, correspondence)
Besides these operational checklist steps, here are some more holistic questions to ask yourself as you are depositing datasets:
How long will the data exist in this repository?
Did you get consent from your participants for subsequent data use?
What sort of ethical responsibilities will future users have if they want to reuse your data?
How will you be ensuring appropriate data provenance (i.e. the earliest known original of this final data: your repository data record) and ownership (i.e. you, and/or the human subjects’ community) is kept if future users want to reuse your data?
Will deidentifying the data be re-identifiable? Will not deidentifying enable risks to participants?
Sources
Choate, R., Adeniyi, K., Akbarifard, A., Beaubien, A., Imbody, S., & Curation Unit. (2021, October 8). ICPSR Curation: The Who, What, Where, Why, and How of Curating Data at ICPSR [Presentation]. 2021 ICPSR Biennial Meeting. ICPSR Youtube Channel. https://youtu.be/AqRRccPpRcw?list=PLqC9lrhW1VvbtV7GtM4u4ZnI1RsDKHIBj&t=623
Markham, A. (2012). Charting Ethical Questions by Data and Type. In Ethical Decision-Making and Internet Research 2.0, Association of Internet Researchers, https://aoir.org/reports/ethics2.pdf
O'Donnell, M. N. & Brundy, C. (2022). Bringing All the Stakeholders to the Table: A Collaborative Approach to Data Sharing. Journal of eScience Librarianship, 11(1), 2. https://doi.org/10.7191/jeslib.2022.1224
Pisani, A.R., Kanuri, N., Filbin, B., Gallo, C., Gould, M., Lehmann, L.S., Levine, R., Marcotte, J.E., Pascal, B., Rousseau, D., Turner,S., Yen, S., Ranney, M.L. (2019). Protecting User Privacy and Rights in Academic Data-Sharing Partnerships: Principles From a Pilot Program at Crisis Text Line. Journal of Medical Internet Research, 21(1), 1-11. https://doi.org/10.2196/11507