DHRI@A-State

Constructing Data

As a model of knowledge, datasets not only represent information for the purposes of study.  They also embody a range of ideas and practices related to the broader politics of representation, inclusion, and access. Therefore, as Trevor Owens of the Library of Congress explains, constructing a dataset requires careful "choices about what and how to collect and how to encode the information" [1]

What and How to Collect

Roopika Risam (on reproducing colonial knowledge and also on archive that contextualize colonial knowledge- think of Vanishing Race)
Participatory and Community Based archive- example Lesbian Herstory Archives pg 140 Intersectional Feminism
Posner radical new digital humanities (shifts focus from mainstream to marginalized but also develops ways of representing people's lives in data "as they have been experienced, not as they have been captured and advanced by businesses and governments".


How to Encode Information

For digital collections, item information is frequently encoded in relational databases.  A relational database is a structured set of data that contains a series of formally described tables from which data can be queried or organized in different ways. This structure enhances searching, browsing, and exhibition functions by enabling users to access items and reassemble collections based on tabular information.
To create a collection with these functions, you can either build a relational database or use a content management system (CMS) that functions as a relational database [2]

Either way, before getting started you will want to develop a metadata schema for your collection. Metadata, or data about data, "provides a means of indexing, accessing, preserving, and discovering digital resources" [3]. While metadata can serve different functions related to these tasks, in this workshop we will be primarily concerned with its descriptive function [4]. Specifically, you will encode individual items with descriptive metadata and develop a schema to ensure that all of the items in your collection are described consistently to facilitate human and computational analysis.

To create your metadata schema, it is generally advisable to build upon established ontologies and/or controlled vocabularies. Not only will this help you structure your data, but, ultimately, it will allow you to integrate your collection into Linked Data initiatives. One of the most popular all-purpose ontologies for describing digital resources is Dublin Core (DC). The ontology contains fifteen elemental terms (in italics below) and a series of qualified terms that extend or refine the original fifteen elements. 

If you decide to develop your schema based on DC: consult the Elements and Qualifiers manuals; note that each term is optional and can be repeated; and keep in mind that you can link DC terms, such as "subject," to controlled vocabularies from the Library of Congress or Getty Institute. In addition to Dublin Core, there are also specialized ontologies for visual culture (VRA Core) audiovisual content (PBCore), and media resources (IPTC), among others.  
 


 

This page has paths:

  1. Data for Humanists Andrea Davis

This page references: