DHRI@A-State

Data for Humanists

'Data' can be a difficult term for humanists. As Miriam Posner of the Department of Information Studies at UCLA explains in "Humanities Data: A Necessary Contradiction:”

When you call something data, you imply that it exists in discrete fungible units; that it is computationally tractable; that its meaningful qualities can be enumerated in a finite list; that someone else performing the same operations on the same data will come up with the same results. This is not how humanists think of the material they work with.

Despite discomfort with the term, humanists today engage with data on a regular basis. The data that shapes our professional lives can be defined as "a digital, selectively constructed, machine-actionable abstraction representing some aspects of a given object of humanistic inquiry" [1]. As this definition suggests, the state of our data - and its utility for research - depends on the construction process. For analogue objects, the process begins with digitization. From there, both digitized and born-digital objects need to be curated, structured and/or annotated to facilitate human and computational analysis. 

Data and the Digital Humanities

In the digital humanities, there are two basic approaches to working with data. The first approach is rooted in the field of 'big data' research. Oriented towards the social sciences, big data research in the digital humanities focuses on "large or dense cultural datasets, which call for new processing and interpretation methods" [2]. The second approach focuses on constructing 'small' or 'smart' datasets that critically engage with - and frequently challenge - traditional classification systems, archives, and cannons.  Whereas the first approach uses computational methods to perform macro-level analyses, the second uses web-based technologies to redress perceived absences and biases in "how people process and document human cultures and ideas" [3]

Activity

Part I

Explore the project websites for selfiecity and The Caribbean Memory Project, which are pictured on the right. In groups, discuss how each project represents an above-mentioned approach. 

Part II

In "Big? Smart? Clean? Messy? Data in the Humanities," Christof Schöch argues that digital humanists need to combine big and smart data approaches: 

We need smart big data because it can not only adequately represent a sufficient number of relevant features of humanistic objects of inquiry to enable the level of precision and nuance scholars in the humanities need, but it can also provide us with a sufficient amount of data to enable quantitative methods of inquiry that help us transgress the limitations inherent in methods based on close reading strategies. To put it in a nutshell: only smart big data enables intelligent quantitative methods.

Explore Voyages: The Trans-Atlantic Slave Trade Database. In groups, discuss whether - and if so to what extent - the project successfully combines big and smart data approaches. Do you agree with Schöch that this is how the digital humanities should proceed? Why or why not?

Constructing Humanities Data

The rest of this workshop will focus on constructing humanities data, beginning with a discussion of classification systems and standards for describing digitized and born-digital objects. The objective of "The production of a data set requires choices about what and how to collect and how to encode the information" "We can choose to treat data as different kinds of things. First, as constructed things, data are a species of artifact. Second, as authored objects created for particular audiences, data can be interpreted as texts. Third, as computer-processable information, data can be computed in a whole host of ways to generate novel artifacts and texts which are then open to subsequent interpretation and analysis." [4]



Recommended Readings and Tutorials

Seth van Hooland, Ruben Verborgh, and Max De Wilde, "Cleaning Data with OpenRefine," The Programming Historian 2 (2013), https://programminghistorian.org/en/lessons/cleaning-data-with-openrefine.

Social Science Research Council, "Thinking about Data," SSRC Labs: Doing Digital Scholarship (2018), https://labs.ssrc.org/dds/articles/6-thinking-about-data/.

This page references: