Archaeology of a Book: An experimental approach to reading rare books in archival contexts

Path: (Digital) Futures

What is the future of the printed book in the digital age? In this project, we have explored this question by considering the long history of interaction between books and their material conditions - whether those conditions are the scene of their production, the site of their collection, or the process of their acquisition. At the same time, we have sought to illustrate some ways in which digital collections like the Primeros Libros project can allow for new paths into book history, creating opportunities for research and for writing.
To conclude this project, we want to turn our attention to the conditions of digitization that produced the online Advertencias repository. The Primeros Libros project was established in 2010 as a partnership between five institutions: the Biblioteca Palafoxiana (Puebla, MX), the Biblioteca Lafragua (Puebla, MX), the Biblioteca Franciscana (Cholula, MX), the Benson Latin American Collection (Austin, TX, USA), and the Cushing Library (College Station, TX, USA). In its initial phase, the goal was to create digital facsimiles of fifty-eight books held by the partner institutions. The collection currently has 384 books in various stages of processing from 23 institutions around the world.
What does this mean for the Advertencias? Though several libraries have multiple copies of the book, the digital collection enables a kind of comparative work that has never before been possible. It brings to the forefront the book's qualities as an artifact in the history of print production in the Americas, situating it primarily within the context of American print culture. Other kinds of history, however, are embedded in this collection: the history of circulation, collection, acquisition, dissemination. We have sought to show where these material conditions intersect with the work's digital presence.
Here, we want to consider how the Primeros Libros collection enables new methods for the study of printed books like the Advertencias by considering two tools for digital analysis developed in conjunction with the Primeros Libros project.
Cobre is a comparative book reader for the Primeros Libros developed by Texas A&M University. It enables readers to explore multiple options for engaging with digital facsimiles, from a reader interface that mimics a printed book to a comparative "filmstrip" that allows for the side-by-side comparison of multiple copies. In the example below, the Cobre reader makes visible a difference in the frontmatter of two copies of the Advertencias (volume one). One copy begins the book immediately after the dedication, while the other has significantly more material, including indulgences dated 1603.
In the proceedings of the 2012 45th Hawaii International Conference on System Science (HICSS), the developers of Cobre describe in detail their process and goals for the Cobre project. After spending several years collection "user stories" in collaboration with the Asociación Mexicana de Bibliotecas e Instituciones con Fondos Antiguos, they set out to develop a tool that would embrace the "Frankenbook": the wide array of variations between multiple volumes of a single text. 
Ocular is a tool for the automatic transcription of early modern printed books. The tool, which was developed by Taylor Berg-Kirkpatrick and others at U. C. Berkeley in 2013, is an "Optical Character Recognition" or OCR tool designed specifically to handle the challenges of hand printed, worm-eaten books. Unlike OCR systems designed for new books, Ocular statistically models variations in fonts, alignment, and inking of letters that are byproducts of early printing presses. They also draw on a language model, or statistical model of what language should look like, in order to fill in information about distorted letters.
As of 2015, Ocular is the state-of-the-art for automatic transcription. When we tried to use it on the Primeros Libros texts, however, it failed because its language model expected monolingual English documents - something like the Wall Street Journal. In 2014, we collaborated with computer scientists at U.T. Austin and at Berkeley to expand Ocular for Primeros Libros by adding multilingual capabilities, as well as an interface for handling the irregular orthography (spelling, punctuation, shorthand, etc.) characteristic of early modern printing. The result is a tool that can automatically transcribe books in multiple languages, like the Advertencias, while simultaneously identifying patterns in language switching embedded in the books. The image below shows a fragment of a page from the Advertencias, a processed facsimile, and an automatic transcription.
Automatic transcription for early modern texts like the Advertencias is still an arduous process: in many cases, accuracy is less than fifty percent. Being able to rapidly transcribe large bodies of text like that of the Primeros Libros project, however, will enable new ways of "discovering" the texts. We will be able to analyze linguistic patterns in indigenous languages, for example, or identify places where authors borrow from one another. When paired with close analyses of the original books, we hope that this will reveal new aspects of the books' role in history, and their position as parts of history.

