Controversy and Integration in the Computational Era of Literature
Samuel HenryHumans vs. Machines?
Can human emotions be boiled down to a series of on and off switches? The line between what is human and what is machine has never been more blurred as technology grows at a lightning-fast rate. One field that is evolving to match this new relationship is humanities, specifically literary analysis. Computation Literary Studies (CLS) is the name of the field that is using computer processes (such as word count, graphing data, and natural language processing) to assess works of literature for genre, style, etc. While some have welcomed computers into the world of humanities with open arms, many have reservations about turning literary pieces into statistical data points. This controversy has led many to argue against the use of CLS in academic studies. There are many issues with using computers in literary studies, however, we need to find ways to carefully integrate the power of computers in how we analyze literature in order to harness the advantages computers give us.
“Data-mine something and you will always find significant associations” (Da, 3)
In her article, The Computational Case against Computational Literary Studies”, Nan Da works to point out faults in the conclusions and logic in CLS studies. She points out how statistical measures logically do not help draw meaningful conclusions when they are applied to literature, specifically as it comes to finding frequencies of words and using them to prove a “statistical” shift in a genre (1). Nan Da also points out how the use of statistics in many of these studies is lax. For example, at one point she points out the irrelevance of applying the tightly defined term “statistically significant” to the highly subjective literary world. As she puts it, “Data-mine something and you will always find significant associations” (3). Da explains that this is because any correlation between two variables that is above 5% can be conventionally thought of a statistically significant in the world of math. However, in the world of literature, Da argues such a simplified world view does not apply. 94% of the data you mine in CLS could disagree with your conclusion, but due to the 6% that supports you, one could mathematically argue nearly any viewpoint (3). Certainly, this is a solid argument against the use of computers in analyzing literature. By using numbers to fight a numerical method of analysis, Da seems to beat CLS supporters at their own game. But why are we holding CLS to such a strictly objective standard while we allow non-CLS studies to take much more subjective leaps in logic?
This is the kind of aggressive stance towards CLS studies Da maintains throughout much of her article. As she introduces dissenting opinions to her argument she barely touches them before shooting down the weaknesses she finds in CLS. In fact, as she introduces an essay by Andrew Piper she first primes the reader with a short paragraph about it’s “fallacious overclaims and misinterpretations” before even summarizing the material of Piper’s study (Da, 3). Da gives us no solutions on how to remedy the problems she sees and despite small nods to the strengths of CLS on the first page of her article, I find myself wondering if she really considered any of the possible benefits of using computers to analyze literature. It seems, to me, a waste to simply write off computers as having no place in the world of text analysis when they have become such an important catalyst for the creation of modern-day literature.
As it turns out, Jonas Kuhn agrees with me. Through his article, "Computational Text Analysis within the Humanities: How to Combine Working Practices from the Contributing Fields?”, Jonas Kuhn not only points out some of the difficulties in working between digital and humanities fields but also poses solutions to effectively solving these issues. For him, there are two problems with Computational Literary Studies discussed in this article. The first one addressed is that of a “scheduling dilemma” (2019). There are practical problems in the timing of when conclusions are drawn in a computational study versus in a humanities study. Kuhn claims that in a literary study conclusions are usually drawn at the end of research while in computational studies conclusions are drawn first and then tested. As Kuhn literally illustrates, this leads to conflict when this crucial stage should be conducted in CLS (2019). However, we are not left in the dark on how one might go about solving this issue. In a third diagram, Kuhn shows us a theoretical workflow solution (2019). This solution may not work but that is beside the point. The point is that Kuhn has accepted the faults in CLS and is working to correct them.
We also see Kuhn’s method of defining and solving problems in CLS in how he addresses the issue of subjectivity. Many critics of CLS argue that computers are unable to fully understand the breadth of cultural, historical, and aesthetic values associated with literature.
“The objective of systematicity in annotation and the concession that certain annotations are influenced by subjective judgments do not necessarily exclude each other” (Kuhn, 2019).
Kuhn makes the point that data derived from computers is not that dissimilar from what you get by close-reading a text. Conventionally, Kuhn argues, an author would guide their “second look” at a text through some subjective lense. Why can this not be done with computers (Kuhn, 2019)? Computers are fast and could certainly be used to speed up the process of looking over and close-reading hundreds of texts. Kuhn sees that the studies of the future could use computers, carefully to help supplement subjective arguments. In Kuhn’s mind, computers would not displace human thought but aid the speed and accuracy of the data collected in literary studies.
Kuhn’s article does a good job of both recognizing the shortcomings in CLS while offering well thought out (if still theoretical) solutions to solve these problems and reap the benefits from CLS. I appreciate his frank undertaking of the issues surrounding CLS. Instead of giving up, he uses research and outside sources to come up with theoretical solutions to the issues he raises.Humans and Machines in Harmony
Reviewing these sources, we see a lot of the controversy that still surrounds the results of CLS. It is fair to say there are still many things that computers cannot do when it comes to studying literature. Computers cannot fully comprehend the complex, subjective, aesthetic, and cultural values that are crucial to the humanities. However, as Kuhn points out, there are many advantages to using computers in how quickly they can sift through massive amounts of text and increase our understanding of patterns within large bodies of work. Where we need to be careful in how we apply these methods. Da certainly shows us there are many pitfalls (such as the importance of statistical significance in analyzing literary data) to fall into when drawing conclusions from machines that cannot comprehend the entirety of the works they evaluate. The world is changing and so should how we study and analyze literature. This changing environment is the one thing neither article seems to address very clearly. Our world is now an intensely electric and technological one. Simply look around and you’re bound to see some kind of digital display (if you are not staring at one currently). Thus we need to find out how to take advantage of the new capabilities high tech affords us. While Kuhn’s solutions in this regard are rather theoretical and rough-cut. More thought should be put into how to apply ideas like his. The future is in harnessing computers, not in stifling new methods as useless before their capabilities have fully been explored.
Nan Da. “The Computational Case against Computational Literary Studies”. Critical Inquiry. Volume 45 No. 3. The University of Chicago Press Journals. Spring 2019. https://doi.org/10.1086/702594
Kuhn, Jonas. "Computational Text Analysis within the Humanities: How to Combine Working Practices from the Contributing Fields?" Language Resources and Evaluation, 2019. pp. 1-38. Springer Netherlands. 26 June 2019. https://link.springer.com/article/10.1007%2Fs10579-019-09459-3