This site requires Javascript to be turned on. Please enable Javascript and reload the page.

Learning Data Ethics for Open Data Sharing Main Menu About This Project Table of Contents Introduction to this OER, and list of topics Introduction to Data Ethics What Constitutes as Sensitive Data? Effects of Good/Bad Data Ethics Introduction to Data Sharing What Could You Share? Journal and Funder Mandates FAIR Data Sharing Restricted Access in FAIR Sharing What Goes Into a Data Repository Record? Introduction to Data Curation Curation Workflows and Checklists IRB Applications and Data Management Plans Informed Consent Data Use Agreements Risk Assessment and De-identification

Machine Learning and Big Data Research

40180 Learning Data Ethics for Open Data Sharing An OER developed by Lynnee Argabright

Machine learning research has some special considerations regarding data sharing, based on the reproducibility and transparency needs required to support its authenticity and scholarly value.

For instance, making the training data available in order for others to achieve similar output results, may violate privacy of subjects within the training data, but anonymizing some amounts of the data may affect the utility of the data to be able to be reproduced.

The journal, repository, or conference in which this research is being shared may offer procedures to limit safe access for specific appropriate use. Helpful documentation you write should explain how the research data was generated, how the AI model operates (without necessarily having to explain highly specialized technical code), and what the thought processes were during the research process that would have affected the model’s predictions. NeurlPS conference at McGill has created a reproducibility checklist to help guide researchers on curating their data and models to support reproducibility.

Machine learning research in particular can be reused for very different intents than initially created for. Your methodologies, data points, and data inferences could be used by others to make decisions about people in ways that could cause harm to the initial subjects your research was supporting. When preparing to share your research, you should therefore consciously consider and accept the limitations and capability that may be involved in your models and data.

The White House Office of Science and Technology Policy (OSTP) has identified 5 human rights principles to guide the design, use, and deployment of AI systems, including setting up safe and effective systems, protecting against algorithmic discrimination, building in data privacy protections, providing visible and understandable notices and explanations about the AI system, and enabling alternatives like opt out options.

This guide did not specifically target ethical considerations for big data. As this is a developing topic of discussion amongst data librarians, here is an initial source to get started thinking about how ethics should relate to the use, sharing, and reuse of big data: An Ethics Framework for Big Data in Health and Research (2019 article providing a step by step process for resolving ethical issues arising from big data in health research).

Sources

Bechmann, A. & Zevenbergen, B. 2020. AI and Machine Learning: Internet Research Ethics Guidelines, IRE 3.0 Companion 6.1, Association of Internet Researchers, https://aoir.org/reports/ethics3.pdf (pg 43-45)
The White House Office of Science and Technology Policy (OSTP). (2022, October). The Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People [White paper]. https://www.whitehouse.gov/ostp/ai-bill-of-rights/
Xafis, V., Schaefer, G.O., Labude, M.K. et al. (2019).An Ethics Framework for Big Data in Health and Research. ABR, 11, 227–254. https://doi.org/10.1007/s41649-019-00099-x

This page has paths:

Table of Contents Lynnee Argabright
Contents of this path: