Data for Humanists
When you call something data, you imply that it exists in discrete fungible units; that it is computationally tractable; that its meaningful qualities can be enumerated in a finite list; that someone else performing the same operations on the same data will come up with the same results. This is not how humanists think of the material they work with.
Despite discomfort with the term, humanists today engage with data on a regular basis. The data that shapes our professional lives can be defined as "a digital, selectively constructed, machine-actionable abstraction representing some aspects of a given object of humanistic inquiry" . As this definition suggests, the state of our data - and its utility for research - depends on the construction process. For analogue objects, the process begins with digitization. From there, both digitized and born-digital objects need to be curated, structured and/or annotated to facilitate human and computational analysis.
In the digital humanities, there are two basic approaches to working with data. The first approach is rooted in the field of big data research. Oriented towards the social sciences, big data research in the digital humanities focuses on "large or dense cultural datasets, which call for new processing and interpretation methods" . The second approach focuses on constructing 'small' datasets that critically engage with - and frequently challenge - traditional classification systems, editorial practices, archives, or cannons. Whereas the first approach uses computational methods to perform macro-level analyses, the second uses web-based technologies to publicly redress absences and biases in "how people process and document human cultures and ideas" .
Explore the project websites for selfiecity and The Caribbean Memory Project, which are pictured on the right. In groups, discuss the following questions:
- How do the projects represent, respectively, the big and small data approaches?
- What are the potential opportunities and oversights of each approach?
- How might these approaches converge and/or diverge over time?
Now that we have discussed what data is and how digital humanists engage with it in their scholarship, you will construct and present a dataset of your own. For the purposes of this workshop your dataset will be a collection of digital objects. In subsequent workshops you will engage with texts (Distant Reading) and physical objects (3D Preservation and Presentation) as data. As you go through these workshops, pay attention to common trends as well as the ways in which constructing and presenting data is distinctive for different data types.