Glossary of Terms For Digital Approaches
Concordance: a list of all of the occurrences of a given word in a text or set of texts. The list usually includes some context for each occurrence. AntConc, a freeware concordance program, describes this as Key Words In Context (KWIC).
Corpus (plural, Corpora): from the Latin, meaning body/bodies; in Corpus Linguistics, "corpus" refers to a sample of text that will be analyzed.
Creative Commons: copyright licenses for digital materials.
Folio: describes a book created using sheets of paper that have been folded once.
N-gram: refers to a sequence of words. The most basic n-gram is a unigram, which represents single words. A bi-gram is a sequence of two words, a tri-gram is a sequence of three, and sequences of more than three words are four-grams, five-grams, and so on. This is an important concept for text mining.
Quarto: describes a book created using sheets of paper that have been folded twice.
TEI: Text Encoding Initiative; a consortium that develops guidelines for encoding texts so that they are machine-readable.
XML: Extensible Markup Language; a markup language used for encoding texts, XML works with a variety of software programs used for coding such as Oxygen and Adobe Dreamweaver; XSL stylesheets can convert XML into other coding languages like HTML.