Effects of Good/Bad Data Ethics
Take a look at this comic by Mona Charabi.
Now take a look at this presentation of a CNN news poll.
Charabi’s comic and CNN’s exit poll table both describe a small group of racial categories that protect identities of the participants but don’t represent, and/or misrepresent, the humans that the data is about. Sharing data isn’t just about the final datasets of spreadsheets or transcripts. What you share can also include visualizations to show how you communicate the results.
When examining the ethical principles of justice and beneficence against the risk of disclosure, what is happening by collapsing these data categories if either of these examples were the final data you shared?
How might removal of selected information, including stripping out identifiers, from a dataset distort it such that it no longer represents what it was intended to represent?
Listen to 10 minutes (28:28 - 40:18) of this podcast from Data & Society, which is giving voice to indigenous populations.
Through these examples of stories about individuals, the speaker says that sometimes, to be named is to be counted and acknowledged. It can enable a group to seek justice. Use voices to allow populations to speak for themselves. Don’t boil their qualitative individuality out into a datapoint. Talk with their communities to get their perspective about how best to uphold their privacy, as well as best reflect their interests and priorities.
Here are some resources that provide guidelines:
- CARE Principles for Indigenous Data Governance
- Operationalizing the CARE and FAIR Principles for Indigenous data futures (2021 article)
- The San Code of Research Ethics (developed 2017 by an indigenous group in Cape Town, Africa)
- NIH policy for Responsible Management and Sharing of American Indian/Alaska Native Participant Data (2022)
The Equitable Open Data Report (2017) by the Detroit Digital Justice Coalition and Detroit Community Technology Project was created because of getting direct feedback from community residents in the city of Detroit about their feelings on the benefits and harms of Detroit’s Open Data Portal. The feedback was consolidated into recommendations that could be used to support conversations and inform policy provisions related to Detroit’s collection, dissemination, and use of open data.
As identified by Tarrant (2020), you can think about risk of disclosure based on the following questions:
- What is the probability of an attacker attempting to re-identify an individual?
- What is the probability of an attacker succeeding to re-identify an individual?
- What are the consequences to the individual if they have been identified?
An impact model (Markham 2020) can be a helpful assessment tool to break down ethical considerations for your dissemination practices. Think about each of the following impact areas in regards to lower- to higher-level granularity and shorter- to longer-term of impact:
- Immediate treatment of people
- Side effects resulting from research data
- Use of data after or beyond initial analysis
- Long term forecasting of data use
Sources:
- Markham, A. 2020. An “Impact Model” for ethical assessment, IRE 3.0 Companion 6.4, Association of Internet Researchers, https://aoir.org/reports/ethics3.pdf (pg 76-77)
- Tarrant, D., Thereaux, O., & Mezeklieva, V. (2020, June). Anonymising data in times of crisis [Report]. Open Data Institute (ODI). https://theodi.org/article/anonymising-data-in-times-of-crisis/