Teaching Data Fluencies

Capture

This module explores what we mean when we say data, moving beyond common assumptions that understand data as raw and objective to an understanding of data as a product of capture. The module includes theories of capture, relevant readings, an in-class activity on "capturing" data, and a lab session using Python and Google Colab.

 

What is data? How is data captured?


Philip Agre introduced the concept of capture in communication and information studies in the 1994 article "Surveillance and Capture: Two Models of Privacy". In the article, Agre notes that capture is a common term in the vocabulary of software engineers and computing professionals when discussing data, and he contrasts it with the concept of surveillance. When we talk about surveillance, we typically think of people or devices watching us (such as cameras and policing). Instead, we can also think about these technologies as doing a form of capture. Capturing involves generating data about an object or activity and reorganizing it so it fits a standard language that computers can understand. Agre argues that through this reorganization, we are being "captured" by the systems as data and as subjects that rely on the system to operate and generate data.

Thinking of data as captured  instead of collected enables a more critical interpretation of what is happening when something becomes data. Data also needs first to be imagined as data to exist and function as data. As such, it requires our participation to exist. We use our perception, imagination and tools available to us to define what can be data. Capturing data entails interpretation: we create a data object by using our imagination and perception about the world in combination with several instruments (e.g., diagrams, documentation, code). Capturing things as data involves making decisions about what goes in and out, what matters, and what describes an object, often in a way that is constrained by the tools used. 


Analogy to photography

The process of capturing data can be compared to taking pictures. A camera, as a tool, limits what can be framed in a picture, and both the photographer and the camera emphasize some aspects over others. Just as a photo might not precisely reflect reality (e.g., northern lights appearing brighter due to camera settings, or edited to appear better with filtering), data collection tools influence what can be captured and what is emphasized.

 

Data diagrams as a form of language

Technical diagrams used to describe data (such as the ERD - entity-relationship-diagram) are a communication instrument and a form of framing. They are a language that people in the tech industry are trained to understand and use to communicate with each other. They are also a form of abstraction and formalization. Diagrams have properties that allow certain things to be expressed while limiting others.

 

The impact of disciplinary norms and tools

Every discipline has its own norms and tools for capturing and communicating data. For example, in communication studies, we use surveys, infographics, or tools like NVivo to collect and organize data. These choices shape the data at various points of interaction and production.

 




Etymology of the word data


Scholars have questioned the appropriateness of the word "data" to describe what we today call data. For example, in the 1950s, sociologist Howard Jensen argued that the use of the word "datum" (meaning "that which has been given" in Latin) was an "unfortunate accident of history". He argued that, in reality, science deals with "captum" – "that which has been taken" or selected by the scientist according to their purpose, thereby acknowledging the scientist's active role in framing data (Kitchin, 2014). Similarly, Johanna Drucker (2011) proposed that we should think of data as "capta," which means "to capture."

 

History and use of the word data

Daniel Rosenberg's chapter, "Data before the fact," in the book Raw Data is an Oxymoron, complements these arguments by revealing the etymology and historical meaning of data:


Historical Evolution of "Data"

Distinguishing Fact, Evidence, and Data:

 

Data and AI

Kate Crawford (2021) highlights the decontextualization of data in the age of AI, where the complex process of data construction is often overlooked. There is a rush to gather data for AI systems with little discussion about their quality, context, or what they truly represent (e.g., mugshots treated as "ground truth" for facial recognition without understanding the circumstances of the people being arrested).

Understanding the history of the word “data” is crucial because contemporary computational usage, particularly in AI, often treats “everything as data,” thereby losing sight of the historical tensions surrounding fact, truth, and evidence. Systems like facial recognition use datasets (e.g., mugshots) as “ground truth” without considering context or the truth embedded within them, sometimes perpetuating flawed theories.

 

Conclusion

The module concludes by challenging common assumptions about data and proposing a more critical understanding:

As Gitelman and Jackson (2014) put it, the saying "raw data" is an oxymoron because data is never truly raw; it is always "cooked".

This page has paths:

  1. Critical and Creative Data Studies Carina Albrecht

This page has tags:

  1. Capturing a classroom (paper exercise) Carina Albrecht
  2. Capturing a classroom (coding exercise) Carina Albrecht
  3. Unlearn Carina Albrecht

Contents of this tag:

  1. raw data