DHSHX

Glossary of Terms For Digital Approaches

Collocate: putting words together in typical combinations.

Concordance: a list of all of the occurrences of a given word in a text or set of texts. The list usually includes some context for each occurrence. AntConc, a freeware concordance program, describes this as Key Words In Context (KWIC).

Corpus (plural, Corpora): from the Latin, meaning body/bodies; in Corpus Linguistics, "corpus" refers to a sample of text that will be analyzed. 

Creative Commons: copyright licenses for digital materials.

Folio: describes a book created using sheets of paper that have been folded once.

N-gram: refers to a sequence of words. The most basic n-gram is a unigram, which represents single words. A bi-gram is a sequence of two words, a tri-gram is a sequence of three, and sequences of more than three words are four-grams, five-grams, and so on. This is an important concept for text mining

Quarto: describes a book created using sheets of paper that have been folded twice.

TEI: Text Encoding Initiative; a consortium that develops guidelines for encoding texts so that they are machine-readable.

XML: Extensible Markup Language; a markup language used for encoding texts, XML works with a variety of software programs used for coding such as Oxygen and Adobe Dreamweaver; XSL stylesheets can convert XML into other coding languages like HTML.

This page has paths: