DHSHX

Choosing a Corpus & preparing it for use | VEP Drama as Case Study

We are fortunate to live at a time when editions of Shakespeare are all over the internet; you can easily do a "Control-F" search of a text online and see exactly where certain words appear within a given work by the playwright (and others, of course).

But the specifics of the texts you choose to search in matter quite a bit--something this book covers in some detail in its sections on the industry of print in Shakespeare's England. When texts appear online, their formatting and content will determine how useful they will be for use. Their usefulness will also be determined by your interests and goals: what do you want to learn from them, and why? Frustratingly, many answers to how to do specific tasks with corpora is "it depends": In another part of this book, we explained what a "corpus" is and the central role a "corpus" plays in digital approaches to literary works.  We have also provided a set of links to the some of the "best" Shakespeare corpora available to you for free online. But "best" is not a one-size-fits all approach, and in going through this textbook we encourage you to think about the pros and cons of each resource that is introduced and how it fits into your pedagogical goals. We now invite you to think about how Shakespeare and his many associated digital resources relates to his contemporaries.

Shakespeare has the benefit of being widely taught and widely digitized, which is part of the reason we are focusing specifically on him. However, as a result of the launch of the Early English Books Online Text Creation Partnership's first 25,000 texts into the public domain on January 1, 2015, we are increasingly able to look at Shakespeare and his dramatic contemporaries in ways which were previously rather difficult. There has been an absolute boom in interest curating and releasing corpora related to Shakespeare's dramatic contemporaries, including Mueller’s Shakespeare his Contemporaries (2015), Ralston and Hope’s 554-, 704-, and 912- play corpora, the Digital Renaissance Editions (Hirsch, ed., 2014-) and Brown’s A Digital Anthology of Early Modern English Drama (2016).

While we will encourage looking beyond Shakespeare's plays, many of the exercises we will model will come from cleaned-up and wholly modernized texts based on the EEBO-TCP phase I transcriptions, provided by the Visualizing English Print Project (Andrew W. Mellon funded; University of Wisconsin-Madison, University of Strathclyde, Folger Shakespeare Library). As the EEBO-TCP transcriptions have undergone several transformations before appearing in their final form, it is important to be conscious of provenance and the transformations digitized materials must go through before they are fully machine-actionable. 
The Visualising English Print project provides several different corpora of English drama from the 16th and 17th centuries, and on multiple pages, the team responsible for these corpora offers more information about the texts they are providing as well as what's been done to the files containing those texts prior to your accessing them. 

Take a few minutes to read about these corpora now. Once you have read this and any relevant pages connected to it, you can prepare for the next step, downloading the files, discussed on the next page in this stream.

This page has paths: