This site requires Javascript to be turned on. Please enable Javascript and reload the page.

DHRI@A-State

Trevor Owens, "Defining Data for Humanists: Text, Artifact, Information or Evidence?"

Trevor Owens, "Defining Data for Humanists: Text, Artifact, Information or Evidence?," Journal of Digital Humanities 1, no.1 (2011) http://journalofdigitalhumanities.org/1-1/defining-data-for-humanists-by-trevor-owens/.

This page is referenced by:

Constructing Data As a model of knowledge, datasets not only represent information for the purposes of study. They also embody a range of ideas and practices related to the broader politics of representation, inclusion, and access. Therefore, as Trevor Owens of the Library of Congress explains, constructing a dataset requires careful "choices about what and how to collect and how to encode the information" [1]. Building on this framework, this section on constructing data will be divided into two parts: collecting and encoding.
Collecting
In her book New Digital Worlds Roopika Risam writes that "the opportunity to intervene in the digital cultural record—to tell new stories, shed light on counter-histories, and create spaces for communities to produce and share their own knowledges should they wish—is the great promise of digital humanities" [2]. According to this vision of digital humanities, collection building should not be a straightforward matter of digitizing the written and material records as they have been preserved by governments, libraries, and museums. Rather, it should entail critical decisions about: what to collect and why; how to organize, manage, and disseminate collections; and who participates in these processes. This type of critical inquiry has begun to coalesce in new digital collection methods ranging from participatory, performing, and post-custodial archives, to cultural protocols for indigenous archival materials (Native American, Aboriginal and Torres Strait Islander).
Collecting Activity
In the first activity of this section, you will upload five to ten digital objects related to a research or teaching interest of your choosing, and develop a short statement describing your collection. For the purposes of this workshop we will use Scalar, a free and open-source authoring and publishing platform developed by the Alliance for Networking Visual Culture at the University of Southern California [3].

To begin, register an account and duplicate the existing book titled "Portfolio Template DHRI@A-State."
Then, follow the import media directions (linked via the annotation above) to create a media file for each of your selected items. Keep track of where your items come from and title them "Item 1," "Item 2," "Item 3," etc. At this stage, do not add any additional information to your items. We will add titles, descriptions and metadata in the next activity.

After you have imported your collection items, go to the "Data for Humanist" page to write a short statement describing your collection as it currently stands or as you imagine it in expanded form. Navigate to the "Data for Humanist" page by pressing the blue "Continue" button pictured above, or via the main menu tab. Once you get there, press the edit button and begin typing. Your statement should include information about the collection's scope (i.e. subject area, chronological period, geographical area, genre, and/or media type), and a discussion of the collection's objectives in light of the above discussion.
Encoding
For digital collections, item information is frequently encoded in relational databases. A relational database is a structured set of data that contains a series of formally described tables from which data can be queried or organized in different ways. This structure enhances searching, browsing, and exhibition functions by enabling users to access items and reassemble collections based on tabular information.
To create a collection with these functions, you can either build a relational database from scratch or use a content management system, such as Scalar, that functions as a relational database [4].

Either way, before getting started you will want to develop a metadata schema for your collection. Metadata, or data about data, "provides a means of indexing, accessing, preserving, and discovering digital resources" [5]. While metadata can serve different functions related to these tasks, in this workshop we will be primarily concerned with its descriptive function [6]. Specifically, you will learn how to encode individual items with descriptive metadata and develop a schema to ensure that all of the items in your collection are described consistently to facilitate human and computational analysis.

To create your metadata schema, it is generally advisable to build upon established ontologies and/or controlled vocabularies. Not only will this help you structure your data, but, ultimately, it will increase access and allow you to integrate your collection into Linked Data initiatives. For the purposes of this workshop, you will build a schema based on Dublin Core (DC), which is one of the most popular ontologies for describing digital resources. Dublin Core consists of fifteen elemental terms (contributor, coverage, creator, date, description, format, identifier, language, publisher, relation, rights, source, subject, title, and type) and a series of qualified terms that extend or refine the original fifteen elements. Many DC terms, in turn, can be linked to controlled vocabularies, such as those developed by the Library of Congress and the Getty Institute, to increase access and provide additional structural support.
Before building a metadata schema based on Dublin Core, it is worth drawing attention to DC alternatives and critiques. While Dublin Core is explicit in its cross-disciplinary aim to transcend the "boundaries of information silos on the Web and within intranets," you may have reasons to use a more specialized ontology, such as the VRA Core for visual culture, the PBCore for audiovisual content, and the IPTC for media resources [7].  Alternatively, you may have reasons to eschew established ontologies and vocabularies altogether; be it to 'decolonize knowledge' or "represent uncertainty, historical contingency, conflict, variation, instability, and multidimensionality" [8]. The stakes of eschewal in today's networked knowledge society, however, are high. As Michelle Schwartz and Constance Crompton write about their decision to reject "existing data models and international standards" in Lesbian and Gay Liberation in Canada (an interactive digital resource for the study of lesbian, gay, bisexual, and transgender history in Canada from 1964 to 1981) "in an age where we recognize the right to be forgotten, we must also weigh the danger... of hiding our history" [9].
Encoding Activity
In this activity you will develop a metadata schema for your collection based on Dublin Core, and follow your schema to encode each of your imported items. As the previous discussion should make clear, the objective of this activity is to practice using metadata to structure your collection, not to convince you that all collections should be encoded with Dublin Core.

To create your schema: consult the Dublin Core Elements and Qualifiers manuals; note that each term is optional and can be repeated; and remember that many DC terms can be paired with controlled vocabularies. Your schema should include directions and examples for titling, describing, and adding DC metadata to your collection items (along the lines of the example below).

dcterms: Spatial
Direction:  Provide GPS coordinates of the place where the original analogue item was created. You can search for coordinates using this website. Put the DD coordinates (separated by a ",") in Scalar's dcterms: spatial field. (Note that GPS coordinates must be formatted in this manner to create maps in Scalar).
Example: 41.65606, 0.87734

After completing your schema on the "Data for Humanist" page of your portfolio, use the search button to navigate to each of your item pages and edit them to include the appropriate titles, descriptions, and metadata.
Data for Humanists 151 image_header 2019-01-04T13:07:02-08:00
Introduction
'Data' can be a difficult term for humanists. As Miriam Posner of the Department of Information Studies at UCLA explains in "Humanities Data: A Necessary Contradiction:”
When you call something data, you imply that it exists in discrete fungible units; that it is computationally tractable; that its meaningful qualities can be enumerated in a finite list; that someone else performing the same operations on the same data will come up with the same results. This is not how humanists think of the material they work with.
Despite discomfort with the term, humanists today engage with data on a regular basis. The data that shapes our professional lives can be defined as "a digital, selectively constructed, machine-actionable abstraction representing some aspects of a given object of humanistic inquiry" [1]. As this definition suggests, the state of our data - and its utility for research - depends on the construction process. For analogue objects, the process begins with digitization. From there, both digitized and born-digital objects need to be described, curated, structured and/or annotated to facilitate human and computational analysis.
In the digital humanities, there are two basic approaches to working with data. The first approach is rooted in the field of big data research. Oriented towards the social sciences, big data research in the digital humanities focuses on "large or dense cultural datasets, which call for new processing and interpretation methods" [2]. The second approach focuses on constructing 'small' datasets that critically engage with - and frequently challenge - traditional classification systems, editorial practices, archives, and cannons. Whereas the first approach uses computational methods to perform macro-level analyses, the second uses web-based technologies to publicly redress absences and biases in "how people process and document human cultures and ideas" [3].
Introductory Activity
Explore the project websites for selfiecity and The Caribbean Memory Project, which are pictured on the right. In groups, discuss the following questions:
- How do the projects represent, respectively, the big and small data approaches?
- What are the strengths and weaknesses of the projects?
- What are the potential opportunities and oversights of each approach?
- How might these approaches converge and/or diverge over time?
Workshop Overview
Now that we have examined what data is and how digital humanists engage with it in their scholarship, we will construct and process some datasets of our own.
Constructing Data

"The production of a data set requires choices about what and how to collect and how to encode the information" [4].
Processing Data
There are a variety of ways to use computational methods to process data.
Recommended Readings and Tutorials
Guldi, Jo, and David Armitage, “Big Questions, Big Data.” In The History Manifesto, Cambridge: Cambridge University Press, 2014 https://www.cambridge.org/core/books/history-manifesto/big-questions-big-data/F60D7E21EFBD018F5410FB315FBA4590.

Social Science Research Council, "Thinking about Data," SSRC Labs: Doing Digital Scholarship (2018), https://labs.ssrc.org/dds/articles/6-thinking-about-data/.

Van Hooland, Seth, Ruben Verborgh, and Max De Wilde, "Cleaning Data with OpenRefine," The Programming Historian 2 (2013), https://programminghistorian.org/en/lessons/cleaning-data-with-openrefine.

Wueste, Elizabeth. “Big Data, Big Problems.” Eidolon, December 18, 2017. https://eidolon.pub/big-data-big-problems-e09a55bf2b31.