This site requires Javascript to be turned on. Please enable Javascript and reload the page.

The Computational Literary Studies Debate: Thoughts from the Fall 2019 Class, Writing in Digital Humanities

Computational Literary Studies: The New Frontier of Literature Research

Computational literary studies (CLS) was introduced in the late 1900s with manual counting of words, a process that was soon automated by computers. Computers have continued to become more and more common across literary studies, and this has provoked both fierce support and opposition. Many of those who oppose CLS believe that it is undermining the close reading that humans can perform. They note that CLS is taking limited funding, thereby reducing the number of close reading studies being done, and point out that many of the conclusions the studies produce are still statistically unreliable. Those who support CLS believe that it will soon enable faster readings of more material, with programs allowing humans to direct where the computer will look and what it will look for.

Two notable papers will be evaluated in order to compare the two arguments: Nan Da’s article arguing that CLS is ineffective and Jonas Kuhn’s article arguing that CLS still has a great deal of potential. This paper will build the conclusion that putting funds into CLS now, even though it may not immediately seem to be valuable, will enable it to quickly become a tool which literary scholars can use to improve research time and efficiency in the future.

Da’s article “The Computational Case against Computational Literary Studies” argues that CLS cannot and do not produce statistically sound conclusions because the programs cannot accurately interpret writing in context. A study by Matthew Jockers and Gabi Kirilloff linking verbs to pronouns had a success rate of around 81 percent, a percentage Da brings up and notes disparagingly is only a “30 percent improvement over pure chance” (608). Da goes on to claim that “there is a fundamental mismatch between the statistical tools that are used [for CLS] and the objects to which they are applied” (Da 601). The statistical tools which she expects to be used are quite rigorous, despite the inherent subjectivity of the subject even with close reading. Da notes the evidence required to prove correlation in statistics, then points out that the success rate of CLS is often far below this (608). In the quote about the fundamental mismatch cited above, Da is essentially saying that CLS is incapable of producing statistically valuable information, and anything less than statistically valuable is of no use. The paper concludes that the use of computers to study literature is neither successful nor efficient, and that manual analysis is still the best option. CLS, it is suggested, do not have much, if any, use in literary studies and financial support for CLS should be withdrawn.

Literary critics ... tend to look to DH to help them
account for literary objects that they feel are exponentially increasing. . . .
Looking for, obtaining copyrights to, scraping, and drastically paring down
“the great unread” into statistically manageable bundles, and testing the
power of the model with alternative scenarios, takes nearly as much, if not
far more, time and arbitrariness (and with much higher chance of
meaninglessness and error) than actually reading them. (Da 638)

Quotes like this one represent arguments that even when CLS is used, it does not save researchers on either time or money, while also making their conclusions less worthwhile (Da 638). While Da’s argument that CLS may not yet be reliable in terms of a statistical 0.05 p-value is perhaps reasonable, many others do not agree with her conclusion that this means CLS will always be of no use and should be abandoned. These contrasting papers suggest instead that computer programs are continually being developed that are making them ever more useful to literary studies.

In his paper “Computational Text Analysis within the Humanities: How to combine working practices from the contributing fields,” Kuhn speaks about research being done in computational literary studies and how the different focuses of CLS can be used together to draw more powerful conclusions. Kuhn argues that the techniques used in computational text analysis can help to answer questions from those studying literature which they cannot answer without the processing power of computers. For example, one scholar can only choose a small number of books to analyze, while a computer can quickly analyze hundreds or thousands of books. The paper also proposes that computational literary studies may not have all the evidence they need in order to make meaningful claims about literature, and it is suggested that collaboration between different disciplines might solve this. He explains that often a “scholar keeps relying on … manual analysis rather than investing time into the development … of analytical tools that may have just a one-time application” (2019). By this, Kuhn is explaining that literary scholars are not using programs because they are too resource-intensive to develop in comparison to the amount they will be used. There is a new method of analyzing context and meaning being developed called rapid probing that Kuhn suggests, with some hypothetical examples, could be the answer to issues with subjectivity, help fix problems with statistical insignificance, and be more versatile and thus more commonly used.

The new methods being developed are intriguing. They make be an effective counterargument to Da’s assertion that funding should be removed because there is no way that CLS can be statistically important. New, more effective, programming techniques are being developed in response to issues with computers not being able to look past basic word counts and such without statistical errors.

Da argues against CLS mostly on the premise “CLS papers are more or less all organized the same way, detecting patterns based on word count” (Da 605). This was true, especially in the beginning of CLS, when computers had only the most basic capabilities. Word counts alone cannot possibly take context into account. The sentiments expressed by a sentence may be entirely different than the sentiments connected to each individual word in the sentence.

Kuhn brings up the point that there are some methods of CLS being developed that can begin to understand context by looking for strings of words rather than only one word, but he admits that “as soon as the application context deviates from the development scenario . . . the error rate will increase—possibly to a considerable degree” (2019). This quote indicates that even the more sophisticated programs are unable to provide much true analysis or understanding of context. Further, it implies that there is not even a way to correct the computer programs by knowing the number of false positives, because that number will vary widely depending on the type of work the computer is reading.

In sharp contrast to this negative outlook, new methods of CLS are currently being developed, including a method called rapid probing. Although rapid probing is still not fully fledged, with prototypes a computer can detect events, then “overall relations [of information about those events] can be visualized.” (Kuhn 2019) This means that computers are beginning to be able to understand the context of words and then display information about that context.

For example, Kuhn discusses how a computer might analyze writing for descriptions of immigration (2019). In one example, the word “immigrate” may be used to discuss ideas, but without certain structures such as a clear action verb, the computer will not count that usage as being an actual case of immigration (Kuhn 2019). So far, this method works better with clear and obvious writing, such as in nonfiction, but it is an example of what is being developed.

While the initial quality of analysis may be limited, the idea of rapid probing allows scholars to explore how the analysis fits in an actual workflow on the target text data and it can thus provide early feedback for the process of refining the modeling. If the rapid probing method can indeed be incorporated in a hermeneutic framework to the satisfaction of well-disposed Humanities scholars, a swifter exploration of alternative paths of analysis would become possible. (Kuhn 2019)

This statement from Kuhn implies that not only could rapid probing be able to understand context, it could actually help its designers to be able to add more elements so that the program can more accurately capture context. Although the initial implementation and optimization of this software might be a time- and money-intensive task, the amount of time the completed program could save would be considerable.

CLS has drawn much criticism from some scholars on the grounds of its originally being based on word count. It is argued that word counts do not take context or alternative meanings into account. There are new techniques (especially rapid probing) emerging that are improvements on word count analysis, and rather than take away funding and shrink CLS, we should encourage the field more. CLS will improve, not destroy, literature analysis.

Works Cited

Da, Nan Z. “The Computational Case against Computational Literary Studies.” Critical Inquiry,

vol. 45, no. 3, 2019, pp. 601–639., doi:10.1086/702594.

Kuhn, Jonas. “Computational Text Analysis within the Humanities: How to Combine Working

Practices from the Contributing Fields?” Language Resources and Evaluation, 2019,

doi:10.1007/s10579-019-09459-3.

This page references:

Web application “Textual Emigration Analysis”: screen view after having selected of the Wikipedia-based extraction results for Austria and furthermore having activated of the detailed text instances for migration from Austria to the United States