Learning Data Ethics for Open Data Sharing

Machine Learning and Big Data Research

Machine learning research has some special considerations regarding data sharing, based on the reproducibility and transparency needs required to support its authenticity and scholarly value.

For instance, making the training data available in order for others to achieve similar output results, may violate privacy of subjects within the training data, but anonymizing some amounts of the data may affect the utility of the data to be able to be reproduced.

The journal, repository, or conference in which this research is being shared may offer procedures to limit safe access for specific appropriate use. Helpful documentation you write should explain how the research data was generated, how the AI model operates (without necessarily having to explain highly specialized technical code), and what the thought processes were during the research process that would have affected the model’s predictions. NeurlPS conference at McGill has created a reproducibility checklist to help guide researchers on curating their data and models to support reproducibility.

Machine learning research in particular can be reused for very different intents than initially created for. Your methodologies, data points, and data inferences could be used by others to make decisions about people in ways that could cause harm to the initial subjects your research was supporting. When preparing to share your research, you should therefore consciously consider and accept the limitations and capability that may be involved in your models and data.

The White House Office of Science and Technology Policy (OSTP) has identified 5 human rights principles to guide the design, use, and deployment of AI systems, including setting up safe and effective systems, protecting against algorithmic discrimination, building in data privacy protections, providing visible and understandable notices and explanations about the AI system, and enabling alternatives like opt out options.

This guide did not specifically target ethical considerations for big data. As this is a developing topic of discussion amongst data librarians, here is an initial source to get started thinking about how ethics should relate to the use, sharing, and reuse of big data: An Ethics Framework for Big Data in Health and Research (2019 article providing a step by step process for resolving ethical issues arising from big data in health research).

Sources

This page has paths: