David J. Kim, UCLA
We chose several volumes to begin to ‘data-ize’ the images into a spreadsheet according to [entity], [relation], [entity] format.10 The process involved first creating a list of all the images within the volumes that contain human subjects, using Curtis' own titles as placeholders. Then, the images were linked to the tribes that they were a "part of," according to the Curtis' organization of the volumes by tribes. Lastly, we provided a description for each image, using the Library of Congress description when it was available but providing our own description with attention to the subject's gender, age and clothing when discernible. We also added columns for these categories as well as for “mythical” after noticing that many titles of the images refer to deities or mythical figures, represented by mostly men in full masks and very elaborate ceremonial gears. While we have discussed more nuanced way of "translating" what one sees in the images and matching it up with the titles that Curtis himself provided, we concluded that given the limited time and resources to manually create a dataset from an image collection, as well as the inherent impossibility of fully capturing visual "information" through language, this approach would suffice as a pilot case study. Again, the overall effect we wanted to achieve was to unpack what Curtis intended as the images to be, using Curtis' titles as approximations, from what is represented in the images. Other aspects that we have initially considered and attempted to data-ize were the image's affect, photographic genre, facial expressions and the "mode-of-address," but these more abstract dimensions of the images were too challenging to capture at the moment.
The procedural aspect of data-izing the set of images focuses on at least momentarily separating the being of the images from their representations. As tedious and time consuming as this step is, generating a usable spreadsheet from scratch brings attention to the subjective and the interpretative nature of data, or as Johanna Drucker states, “data as capta."11 The collaborative process of data-izing the images--which introduced challenges regarding what to "capture" in our reading of the images and how to best describe in simple descriptive language what the images visually communicate to us—only serves as a further reminder that each sees and describes the world differently. In the context of the increasing epistemological authority of data in digital environments, in which data is often "assumed to be a 'given' able to be recorded and observed," the manual data-izing process can potentially serve as an opportunity to communicate the simple yet important concept that data is inscription.12 If data is not just a given, then we wanted to clearly specify the gender, age and clothing, which are themselves often indeterminate, as the categories of description that our dataset focuses on. As long as the process of creating this network representation emphasizes the subjective nature of data and the constructedness of Curtis’ documentation of the tribes he chose to represent, we thought this might be a worthwhile collaborative experiment. This is not to claim that this approach to unpacking the empirical claims of authenticity and objectivity in the archive is without its own set of shortcomings and biases. After all, as the saying goes, "you find patterns where you look for them," but at the very least, the parameters for looking for patterns can and should be specified and justified.12
Once the data was collaboratively gathered amongst graduate and undergraduate students involved in this project, we used Google Fusion Tables to visualize the data in its “network graph” mode. Given the relatively low level of quantifiable elements in the dataset, as well as the accessibility of Google Fusion Tables for the purposes of prototyping, more advanced network analysis tools such as Gephi and Cytoscape seemed unnecessary at least at this point.
[Visualization 1: Volume 1 of The North American Indian in Google Network Graph]
This visualization generated by Google's network graph offers limited yet useful representation of what we had in mind in terms of modeling the separation and the connection among the tribes, images and the subjects in the images. The nodes along the inner circle constitute the images and their labels refer to Curtis' titles. The nodes in the outer circle are the subjects, labeled by the descriptions from either the Library of Congress or our own. It attests to the mediating function of the images and it shows the scale of Curtis' documentation that varies from tribe to tribe. Although not precisely algorithmic, certain areas of higher level of connectivity, or the nodes with one-to-many relationships, as well as the overall concise and distant view of volume 1 led to discussions in the following sections of this path.
You can see the datasets for the selected volumes here: volume 1. volume 10. volume 15.
10. Although only the dataset for volume 1 is discussed here, Amy Borsuk and Beatrice Schuster have also compiled data for volume 10 and volume 15. I would also like to thank Dr. Zoe Borovsky, Librarian for Digital Research and Scholarship at UCLA, for giving me early feedback on our datasets and prototypes. ↩
13. In regards to the efficacy of data and reading for "patterns" for humanistic inquiry, Stephen Ramsay's justification for "algorithmic criticism" in the related context of text analysis is helpful here: "Algorithmic criticism seeks a new kind of audience for text analysis--one that is less concerned with fitness of method and the determination of interpretative boundaries, and one more concerned with evaluating the robustness of the discussion that a particular procedure annunciates," in Stephen Ramsay, Reading Machines: Towards Algorithmic Criticism (Urbana, IL: University of Illinois Press, 2012), 17. ↩