Online Text RepositoriesThe following list only covers large-scale repositories that have broad public appeal. But make sure you don’t forget about the smaller digital archives and online exhibitions from libraries, museums, galleries, and other cultural institutions. Start with this massive list if you need ideas!
Project Gutenberg, which offers free online ebooks -- over 50,000 at present -- is most likely the first online text repository that most of us have come across. The primary Project Gutenberg includes only texts out of copyright by U.S. law, but make sure to browse its Affiliate Resources and Projects for similar projects based in other nations (therefore following other copyright laws and/or offering more texts in languages other than English). Project Gutenberg offers flexibility, as it allows users to choose from many formats (including plain text, HTML, PDF, EPUB, or Kindle formats). However, these texts derive from what is colloquially referred to as “dirty OCR:” the text itself comes from the uncorrected results of Optical Character Recognition software processes on images (pictures) of physical books. As a result, many “Gutenberg editions” do not meet the standards of accuracy and reliability of scholarly editions, which benefit from the knowledge of academics and sharp eyes of professional editors and copyeditors. Some texts, certainly, come with data about the provenance of the text (which edition was used, for example), and some benefit from volunteer organizations of proofreaders (such as Distributed Proofreaders), but this is not always the case. Keep these caveats in mind if you assign a Gutenberg text for course readings, use it for reference during class sessions, or cut-and-paste the text for use in text analysis tools.
The Internet Archive, a nonprofit company based in San Francisco and dating from 1996, envisions itself as the library of the future, providing free, open-access materials (including textual, visual, audio, and video texts). It archives the internet itself, as well as digitized assets. Quite ambitiously, it promises nothing less than universal access to all knowledge, thus invoking the activist tone and interest in legal matters related to copyright and access that is typical of OA (open access) advocacy. It also curates a fantastic bibliography of similar online library projects. Kalev Leetaru’s “Reimagining Libraries In The Digital Era: Lessons From Data Mining The Internet Archive” is a great example of the sort of advanced digital humanities work that can be done with resources from the Internet Archive. For an important discussion of the ethical stakes of creating, defining, and using archives of all kinds, see Jacqueline Wernimont's Justice and Digital Archives working bibliography.
The HathiTrust Digital Library is a collaborative repository that combines the powers of over 100 libraries and institutions and hosts about 14 million digitized texts (including some from GoogleBooks and the Internet Archive). Items in the public domain are open for anyone to view (and typically in many different formats), but if your institution is a member of the partnership (check here), you may have additional opportunities for interacting with HathiTrust’s corpus. Practically speaking, for most of us, the primary difference between a user who is a part of a member institution and a user who isn’t is the ability to download an entire text, rather than simply view texts and search their entire repository. For more advanced digital humanists, the ability to use HathiTrust’s datasets can be useful for larger data mining or topic modeling projects.
You no doubt are familiar with GoogleBooks, which calls itself “the world's most comprehensive index of full-text books,” as a convenient portal for finding books (and hoping to find an out-of-copyright text that you can read right in your browser without further action on your part). It is a complicated service with a complicated history, as we indicate in Digital Humanities in the Classroom; GoogleBooks helpfully provides an overview of these issues. Jonathan Band’s “The Google Library Project: Both Sides of the Story” provides a scholarly summary of the legal issues, while CNET and The Atlantic have both published accounts aimed at a broader public audience. The very different perspectives of the digital humanities and librarianship communities toward the GoogleBooks controversy are summarized by Jessamyn West “Google’s slow fade with librarians.”
Copyright ResourcesFor guidelines specifically tailored to your institution and/or region, you may wish to consult any resources that are made available through your university’s legal department. Some excellent university resources available online that also contain helpful guides applicable to a wide variety of situations include Stanford University’s Copyright Centre, which has excellent guides to Fair Use, Educational Use, and other matters: http://fairuse.stanford.edu/, and for the Canadian higher-education context, The University of British Columbia’s detailed resource on copyright in the classroom with helpful flow charts.
Another reliable source of information on copyright is through national government resources that contain the full legal documents on copyright. The official copyright acts are easily available. The acts, however, are not tailored to the classroom and often require a lot of reading in order to discover the sections most relevant to educational use. Usually, therefore, you will not need to access this level of detail. However, reading the acts in full can help you to feel fully confident about your use of materials. These resources are available online here for Canada, the UK, and the US.
Examples of Best-Case Digital ResourcesWe stressed in Chapter 2 that the best digital resources include resources, apps, or guides to help you navigate its content and use it in the classroom as creatively and efficiently as possible. We highlight the Digital Public Library of America as a best-case scenario for digital resources beyond text repositories. Make sure to browse its educational resources for ideas about how to incorporate it in your courses, and if you feel ambitious, DPLA boasts many ways for you (and your class) to get involved. We recommend their video tour for a more comprehensive look at the opportunities it opens up, as well as the group of apps that users have developed to interact with DPLA in creative, fun ways. Look for other major libraries, most of which have developed public outreach projects that you can teach with. The New York Public Library, for example, maintains a variety of helpful pedagogical resources, including Tools for Teaching and a blog for teachers updated (roughly) each month.
In general, digital humanities projects often boast an interesting suite of ideas for research and teaching based on their data. University College London’s Bentham Project, an innovator in crowdsourcing, has used volunteer labor to assist their primary aim of producing a new complete edition of the works of Jeremy Bentham. The project began decades ago, but it continues to flourish and create new opportunities for digital user involvement (particularly through the Transcribe Bentham platform, by which ordinary readers can help convert images of manuscript into encoded plain text). In your class, you can use its impressive suite of tools to teach students about Bentham, about manuscript culture, about the digital humanities, or about 18th century cultural history.
One project similar in size and scope to the Bentham Project, The Mark Twain Project Online (MTPO), supported by the University of California system, aims to create a full digital edition of the works of Mark Twain (letters as well as published fiction and nonfiction). Along the way, the MTPO has constructed a suite of Research Resources that can enhance lessons related to American literature or history. In the British context, the fascinating Early Modern Map of London (MoEML), which started its life as a digital edition of a 1561 map, now includes a gazeteer, encyclopedia, and other resources that support the teaching of English literature and history and of mapmaking and visual history. Like the Bentham Project, MoEML solicits user help in ways that can be appropriate and exciting for your students. They describe these opportunities here, but perhaps even more useful are the cluster of documents designed specifically to help you teach with MoEML. Additionally, its Praxis section offers thorough, accessible documentation and tutorials for more advanced users, making it also a powerful source for anyone teaching the digital humanities or textual encoding.
To find more resources, refer to this comprehensive list of digital humanities projects seeking collaborators and other types of help (hosted by DHCommons) to begin browsing for “good fits” for your courses. If you wish to use projects that are already done and ready for full use in the classroom, you can also find interesting projects in the lists of nominated and winning projects from the annual DH Awards competition. The European Association for Digital Humanities also hosts a comprehensive list of fantastic, multimedia DH projects that are ideal for classroom use. If you wish to see projects of local significance, you can also search your institution’s website for the DH projects hosted at your institution (particularly if yours boasts a center for DH; check centerNet if you do not know) or for lists that your librarians have put together of DH projects they find useful (NYU has one, for example). If you do not have a center or allied projects at your institution, browse grant organizations for award-winning DH projects in your country (in the U.S., such entities include National Endowment for the Humanities’ Office of Digital Humanities and the American Council for Learned Society’s Digital Innovation Fellowships).