Workbook for Introduction to Digital Humanities: A-State

Aimie Michelle's Text Analysis

The Portuguese philosopher António Quadros argued that there are certain words central to the generation of national identities, or "...expressions that are 'mothers,' words that conceal and at the same time reveal a long and mysterious experience that is supra-individual and trans-temporal." [1]  Quadros identified the ten "mother-words" of Portuguese identity as Mar, Nau, Viagem, Descobrimento, Demanda, Oriente, Amor, Império, Saudade, Encoberto (in English: Sea, Ship, Journey, Discoveries, Quest, East, Love, Empire, Yearning, Secret). [2]   While Quadros was perhaps making an individual assessment drawn from his expertise as a Portuguese translator and philosophical historian, the selection doubtless reflects a cultural belief more than literary analysis - an idea of what the "idea of Portugal" should be, loaded with hegemonic values and a preconstructed image that maps words to already defined (and desirable) meanings.  I wondered how one would evaluate and select the "mother words" of a culture when examining the corpora of a culture, taking advantage of the erasure of meaning emphasized by Franco Moretti to create a less subjective tool for cultural analysis. [3]

I selected Book One of the core text of Portuguese identity, the epic poem Os Lusíadas, by Luís Vaz de Camões, as the starting subject for my investigation of the text analysis tools Voyant and Lexos.  For an easier introduction to the tools, I used the online version of William Julius Mickle's 1877 English translation provide by Project Gutenberg.[4]  From the beginning of the process, however, there were complications that alerted me to limitations of the tool's ability to provide an "objective" reading.  The operator's decisions regarding the selection and preparation of text introduces the first layer of interpretation that could skew results. Do I emphasize content, selecting only the lines of the Cantos themselves, or should I regard the book as an artifact in itself, including the prefaces and editorial commentary?  What distortions are produced in translation itself and the translator's decisions, such as William Julius Mikle's use of 19th century conventions that rendered the text to approximate pronunciation by using apostrophes to contract words into fewer syllables?  Should that be corrected to the proper spelling?  If I decide to clean up the text for analysis, how do I decide which parts of the text are important and which ones are trivial enough to be removed?  For the purpose of my first experiment, I decided to use the Mickle text as it was presented, without the editorial notes and preface, and with line numbers removed, always keeping in mind that this may reveal more about how 19th century English authors interpreted "the Lusiads" than its relationship to Portuguese culture or ideology.

The resulting analysis by Voyant immediately generated some interesting results, as the word cloud showing the most frequently appearing terms presented words which were easily related to familiar meanings of the text - brave, shore, race, Gama.   It would have been easy to launch into an analysis relating them to what I already knew of the ideology of the Portuguese discoveries, especially as the immediately visible collected phrases recalled themes of domination, colonization, danger, and the mythology of Portuguese exceptionalism promoting heroic masculinity. Order of appearance could be interpreted as order of importance, and I became very aware of how naturally I could slide into a critical discussion of nationalistic themes in Portuguese literature rather than focusing on the data generated by the analysis itself.

A second analysis using the Lexos text analysis tool helped to caution against over-interpreting the results.  The word cloud generated in Lexos was very different from the result in Voyant, capturing more of the more frequently appearing incidental words and presenting them in a less ordered manner.  I had to hunt for the terms that had appeared at the top of the Voyant list, and the jumble of words on the page, while visually interested, was overwhelmingly cluttered by prepositions and pronouns.  Yet again, I saw patterns that recalled the key themes of Os Lusíadas, for example, the contrast of fear and dread to brave and bold.

Lexos offered a greater range of options for preparing the document for analysis, in particular, the "scrubber" tool, which gave the operator a wide range of choices to help target the range of analysis for the text. Yet, as Moretti demonstrated in his discussion of the frequency of "and" in "Bankspeak: The Language of World Bank Reports", even these jumbles of frequently overlooked words can yield new insights.[5] Lexos also offered comparative analyses of different texts, which would allow the operator to analyze several different translations together.

In summary, even this limited investigation of these two tools showed both the possibilities and problems of digital text analysis; while being mindful of the bias introduced by the operator in the selection and preparation of data, the tools could make it possible to discover patterns across larger bodies of work, such as all English translations of Os Lusíadas, or reveal previously unnoticed differences, such as the changes in translations between different centuries.  Given enough time and effort, the tools could possibly produce the "mother expressions" of a culture's entire corpora of nationalistic literature, and the words and relationships identified by quantitative analysis may hold up a different cultural mirror than the words selected by subjective interpretation.

[1]  António Quadros, A Idéia de Portugal, quoted in Fernando Santoro, "Saudade," in Dictionary of Untranslatables: A Philosophical Lexicon, eds. Barbara Cassin, Emily Apter, Jacques Lezra, Michael Wood. (Princeton, NJ: Princeton University Press, 2014), 929.  Accessed online March 5, 2018, at
[2] António Quadros, O Espírito da Cultura Portuguesa: Ensaios (Libson: Soc. de Expansão Cultural, 1967) quoted in Donald Warren, Jr. “Portuguese Roots of Brazilian Spiritism”, Luso-Brazilian Review 5, No. 2 (Winter 1968), 6.
[3] Franco Moretti, “Patterns and Interpretation,” Pamphlets of the Stanford Literary Lab (2017), 2. Accessed March 4, 2018, at
[4]  Luís de Camões, The Lusiad, or The Discovery of India, an Epic Poem, translated by William Julius Mickle, London: George Bell and Sons, 1877.  Accessed March 5, 2018, at
[5] Franco Moretti and Dominque Pestre, “Bankspeak: The Language of World Bank Reports,” Pamphlets of the Stanford Literary Lab, (2015), 17.  Accessed March 6, 2018, at

This page has paths:

This page references: