TEI logo
1 2019-04-26T17:52:20-07:00 Hannah Jones 9fd3692ef3b42eef9cf0438b5c2a4855c2acfd56 32749 6 The logo of the Text Encoding Initiative plain published 2019-04-26T17:58:25-07:00 Wikimedia Commons, https://commons.wikimedia.org/wiki/File:Text_Encoding_InitiativeTEI_Logo.svg Creative Commons .png Hannah Jones 9fd3692ef3b42eef9cf0438b5c2a4855c2acfd56This page is referenced by:
-
1
2019-04-02T13:31:27-07:00
Encoding a Digital Edition of the Chronicle with XML-TEI
22
plain
published
2019-05-06T22:52:19-07:00
After transcribing the digital images of the manuscript in FromThePage, we encoded the text in XML (extensible mark-up language) according to the guidelines of the Text Encoding Initiative (TEI).
The TEI guidelines, which are widely used by scholars and cultural heritage institutions, provide standards for presenting texts in a machine-readable format that is also useful for researchers and information professionals in the fields of humanities, social science, and linguistics.
XML was particularly suitable for our project because of its focus on semantic interpretation. Differing significantly from other markup languages like HTML, XML values the meaning of a textual resource rather than its appearance on a screen, making it particularly useful for historians. In the case of our project, the semantic functionality of XML allowed us to encode our temporal markers directly into the text, through the use of referential tags and a custom-built taxonomy.
Another reason for using XML to encode our text was its flexibility. Especially within the TEI guidelines, XML supports multiple levels of granularity within the same text - meaning that users can get very specific with their tags in places that matter most for their work, while sticking to general encoding rules elsewhere. For our project, this meant that we were able to devote extra time and energy to encoding our temporal markers in as much detail as possible, while leaving other aspects of the text untouched. Thus, we were able to optimize the short timeline and limited resources of our project, while leaving the code viable overall for anyone who wishes to go into more detail in other areas later.
Finally, XML was ideal for our project because of its independence and interoperability. Because XML files (like .csv and .tif files) are not dependent on any particular software to run, they benefit from a greater longevity and stability than other types of files. Using XML and the TEI guidelines to encode our digital edition means that it is compatible with a wider array of digital history tools. Just as we hope that our project will hopefully give broader exposure to this manuscript, this interoperability will make it easier for future scholars to build on our work.
Using FromThePage's export function, we were able to generate a skeleton XML document consisting of our transcription and the contextual metadata concerning its creation. Although this document required extensive edits, this saved us considerable time throughout the encoding process, as it meant we did not have to create our digital edition from scratch. We then used both Atom, a free open-source text editor, and the Oxygen XML editor to edit the skeleton code provided by FromThePage. The end result was a viable XML edition of our transcription, which is now freely available to the public:
Having finished the encoding process, we then explored ways to make our digital edition suitable for further experimentation via TEI Publisher, a platform that transforms TEI-encoded documents into digital versions that are easy for humans to read.