Arguments Against and For Computational Litarary Studies in Digital Humanities
First, we will discuss the critics of DH, one of which Nam Da critically argued in her paper “The Computational Case against Computational Literary Studies” that Computational Literary Studies are not suitable tools to analyze literature. Her main area of critiques is in Computational Literary Studies, a subfield of DH. She sharply pointed out that almost all arguments in CLS papers are organized based on words count but those superficial statistics failed to explain and analyze the human emotion in literature.
So all things appear in CLS are simply fancier way to tell people words frequency but futile. People eventually only get a pile of complicate text statistics and conclude no-result finding or inaccurate finding.“In CLS data work there are decisions made about which words or punctuations to count and decisions made about how to represent those counts. That is all. The highest number of consecutive words (1-grams) that CLS work has looked at is three (trigrams). Mark Algee-Hewitt looks at the probabilities of bigrams (the likelihood that one word will be followed by another specific word) to calculate corpus ‘entropy,’ but this is just another way of saying ‘two words that appear together’” (Da, 606).
Many examples were included to illustrate the inefficiency of CLS. For instance, Da used Piper’s essay “Novel Devotions: Conventional Reading, Computational Modeling, and the Modern Novel” as the example to criticize superficial word count (Da, 611). Piper concluded the structural difference of texts from the word-frequency difference. He stated that the last three chapters of Augustine’s Confessions is significantly distinct from the first ten chapters and each other. “Piper attributed this difference to an experience of conversation happened in chapter 10— an experience that he argues made a real difference in terms of vocabulary output.” (Da, 612) But actually who have read Confessions and analyzed the texts artificially knew this variation is because Augustine turned to the discussion of Genesis after spending ten chapters on an autobiography. So, of course, different words showed up and the conversation has nothing to do with the difference between chapters. Through this demonstration, she clearly showed me that simply analyzing word frequency and ignoring their syntax, position, context, and semantics contributed to misinterpretation of text and fallacy in deduction of conclusion. However, these evidences were not compelling enough to completely deny the utility of DH in literary analysis. I found other papers giving rebuttals to Da's arguments.
In “Future Directions and a Roadmap in Digital Computational Humanities for a Data Driven Organization”, authors Shalini Upadhyay and Nitin Upadhyay claimed that computational artifacts, probably computational literary studies, were needed to collect and understand tremendous volume of digitized tradition, heritage, and culture.
To support their argument, the authors provided examples of works in DCH, theoretical foundations of it, and future expectations. Examples of related works in DCH gave firm backup for the point proposed in the essay.
“Sterling attempts different computational intensive artifacts to generate a wide spectrum of the literary corpus. Stickney proposes a new set of skills including digital literacy, computational pedagogy and computational literacy for excelling in DCH” (Upadhyay, 1056) .
These examples demonstrated the easiness to collect and analyze literature with computational artifacts, which is contradictory to Da’s perspective viewing CLS as inefficient and useless. Therefore I firmly advocate that DH and various computational artifacts can greatly help the study of literature.
While Da mainly focus on a very narrow point, which is the word count method in CLS. On the other hand, Upadhyay pay attention to a variety of techniques in DH like network reconnaissance, distant reading, and quantitative analysis of literary analysis etc. With the assistance of multiple computational artifacts, it’s much easier to analyze digitized literatures and data. For example, to explore text collection, researchers are able to generate a literary corpus and it can easily compare against another one to see the distinct feature. Like Kilgarriff describes a simple yet efficient way to get to know a document collection by comparing a list of its top 100 keywords (Kilgarriff, Adam. 2012. “’Getting to Know Your Corpus’. In Text, Speech and Dialogue.”). It is unfair to deny CLS by merely denigrating the utilitarian of word count.
Also, I found another paper “Word Counts and Topic Models” supporting the effectiveness of computational literary study, especially word count method. The frequencies of words are easily calculated and can provide background information of literatures: Lexical features such as word, sentence, and text length, for example, give indications on style, genre, and type of text(Günther & Quandt, 80).
This is a corpus based on the analysis of article “Declaration Concerning Lawful Sports to Be Used” by King Charles 1 in 1633. This declaration mainly argued labors should have time to do lawful sports and recreation to better service the god. That information is obviously exhibited above through the tool using word count to analyze literature. Although it is difficult for computer to understand the meaning of text intended by the author, complexity can be reduced by breaking it down to a list of words. The most frequent words, for example, can help to get an impression of the document’s vocabulary. Also, we can use different options to study keywords in context: instead of calculating the frequency of single words, many frequent words can be jointly calculated (Elisabeth Günther & Thorsten Quandt, 80). For example, in the visualization above, if you click on “Links” tab, you can clearly see the connections between a single word and other frequent words. This network between words gives me a deeper insight into the of this article. For instance, word “lawful” is bound to “recreations” and “sports”, “people” is followed by “shall”, and “service” is linked to “god” and “church”, they somewhat inform me about the relationship between people’s rights of recreation and divine duity.
Upadhyay’s paper maintained a neutral stance, providing both advantages and shortcomings of DCH. Authors not only supported computational artifacts are efficient tools, but also admitted its limits showing the duality of Digital Humanities and computational artifacts. They state, “It [DCH] limits the overall focus of humanities and just targets at one aspect of it in the digital world.” (Upadhyay, 1056) It is true that computational artifacts are widely used in liberal arts and social sciences, but the focus is narrow as it only addresses the foundation of computational thinking, cybersecurity, digital forensics and cyber participation. In that way, these two papers are holding similar perspectives. But the limit of DH is not sufficient to smear its advantages. The opinion supporting DH is more compelling.
Conclusively, I will say Computational Literary Studies has its limit in understanding humanity in the literatures, but it is able to help researchers to grab basic topics and background of papers. And they can flexibly recruit a wide range of computational artifacts to assist studies.
Works Cited
Nam Z. Da, “The Computational Case against Computational Literary Studies.” Critical Inquiry, Vol. 45, No. 3, Spring 2019
Shalini Upadhyay, Nitin Upadhyay, “Future Directions and a Roadmap in Digital Computational Humanities for a Data Driven Organization.” Procedia computer science, Vol. 122, pp.1055-1060, Jan.2017, DOI. 10.1016/j.procs.2017.11.473
Elisabeth Günther & Thorsten Quandt (2016) Word Counts and Topic Models, Digital Journalism, 4:1, 75-88, DOI: 10.1080/21670811.2015.1093270
King Charles 1, “Declaration Concerning Lawful Sports to Be Used”, 1633
Voyant Tools, https://voyant-tools.org/?corpus=675aaad5133b6d703408b8847145fc24