Literature as "Data"
Literature as "Data"?
Data/datum: identifying what’s “given” in a text part of thinking about how one might set about gathering, then analyzing, what’s there.
- OED definition(s)
- datum (ˈdeɪtəm)
- Pl. data (ˈdeɪtə). [L. datum given, that which is given, neut. pa. pple. of dare to give.]
- 1. a.1.a A thing given or granted; something known or assumed as fact, and made the basis of reasoning or calculation; an assumption or premiss from which inferences are drawn.
- d.1.d pl. The quantities, characters, or symbols on which operations are performed by computers and other automatic equipment, and which may be stored or transmitted in the form of electrical signals, records on magnetic tape or punched cards, etc.
- datum (ˈdeɪtəm)
- Angle: learning to look at what’s “given” in a new way can be one benefit of working with digital tools
- Context: Trevor Munoz/Katie Rawson and others (shout out to Cordell and Tarpley bibliographies)
Why You Should Care
- Thinking about literature as data: linked to having a sense of how one might use these computational tools in the first place, as well as why -- some of the many reasons why -- one might wish to do so.
What It Isn’t (or Doesn’t Have to Be…)
- What thinking of ‘literature as data’ does not demand is locking oneself into a paradigm of “distant reading” only, with visualizations some affectionately refer to as “data hairballs”…
- Imagining that one diagram will explain—or contain, or answer—all questions one might generate about a topic fails to do justice to the complex, variable activities of reading itself.
- Moving back and forth among scales of understanding is vital.
- As Ted Underwood's comments on Martin Mueller's “scalable reading” suggest, doing this is challenging--for everyone! Yet this ability to “scale” is something you, as students and scholars, are developing all the time: quick examples. Think of computational methods as extending your reach in particular, definable, and comprehensible ways.
- Example: Serendip's Shakespeare_50 Model [[with quick directions on changing scales]]
- As Ted Underwood's comments on Martin Mueller's “scalable reading” suggest, doing this is challenging--for everyone! Yet this ability to “scale” is something you, as students and scholars, are developing all the time: quick examples. Think of computational methods as extending your reach in particular, definable, and comprehensible ways.
Patterns
- Epigraph: Let/make “the facts arrange themselves” (Eliot)
- Humans notice and discuss patterns among works they read, too—this is how we’ve come up with retroactive genre categorizations, etc. What digital tools can provide, however, are readily rendered means of bringing forward some of the patterns that we as human readers have difficulty noticing—the tendency of an author to prefer coordinating over subordinating conjunctions, or adjectival to adverbial phrases, for instance.
- Understanding how a particular program or piece of software tends to organize textual information—in other words, what sort of “data” it is likely to yield about a work or group of works—will help places in context the sorts of things it is likely to show you, as stepping-stones for further inquiry.
The Specter of “Big Data” [[**may not include this part**]]
No one phrase seems more likely poised to set humanities scholars’ collective hair on end than “big data,” with its associations of impersonality. Such a perceived emphasis may seem to imperil the careful qualitative consideration and analysis many scholars in the humanities and social sciences hold dear.
Understanding distant reading tools as providing ways into a larger problem can be helpful. As literary critic and scholar Christopher Ricks is fond of saying, those pursuing scholarly or critical questions must form “a handle to get hold of the bundle” before inquiry can progress. Metaphorically, the same is also true of digital work. However beautifully patterned a visualization may be, developing a sense of orientation to the results is crucial to making use of the insights it may present.
Keep In Mind . . .
- If at first it doesn’t work, don’t lose heart! [[advice on patience here]]
- As we change frames of reference, our brains try to reorient us using cues with which we are familiar, which is part of the reason it’s easy to become distracted through the Internet’s “pages.”
- If you’re feeling lost, try to remember why you asked your question in the first place, then look for the clues that will help you move forward!
- Jotting notes about your question ahead of time may also help.
- What can you type into the interface, and how does the tool then place this in context? In other words, what’s your “handle”? And why should others care about it?
Illustrations [[potentially, interactive?]]
- For instance: Voyant
- Some points of entry: words, phrases, clusters...
Managing Metadata
- Something of a chicken and egg argument:
- Can’t gather files without metadata—they have to have names, after all!
- Yet it’s hard to appreciate what can make metadata interesting from just one file. Using metadata to organize your corpus allows you to create groups that are both comprehensible and computationally tractable.
- Metadata: defined as “data about data”: sense of “boxes all the way down”:
- Once you’ve developed your “handle,” or your research question/interest, you’ll often find that taking a step back to think about exactly what you’re looking at helps (cite VP).
- [[more here]]