Distant Reading Exercises


Put in several search terms and explore how the Tweet data is charted in terms of emotion, tone, and content. Perform a "sanity check" of the Tweets you examine. Does the sentiment analysis seem to match what you're seeing, overall? What surprises you?


Put in some text you've written, to conduct basic stylometrics. See your text's grade level. Look at your frequency of adverbs. How many hyper-embedded sentences have you used? We'll graph our data together, live in Google Docs, to get a sense of the range within the class.


Pick two Project Gutenberg texts -- maybe two from the same author, or from the same time period. Download and then clean them (remove the PG footer and any other framing text). Upload the two texts into Voyant. Experiment with the various frames visible to you: Concordance, Links, Terms. Some things here are similar to DataBasic; others are more off-the-wall. Let's especially talk about "Bubbles" and "TextualArt," tools that really enable a kind of "distant reading" of an individual text that isn't data-driven but instead about the sort of sonic experience of a text. Some research suggests that ears hear patterns better than eyes see them; these tools potentially let you do that.

Other tools we'll discuss:


DataBasic includes two tools we'll discuss: WordCounter (which does a basic word count and bigram/trigram analysis) and SameDiff (which finds the cosine similarity score between two texts). 

Daniel Soper's Sentiment Analyzer 

Soper ranks the overall sentiment of texts from -100 (negative) to +100 (positive), using a machine learning algorithm trained on the American National Corpus.

Hacker Factor's Gender Guesser

Hacker Factor rates texts as "masculine" or "feminine" based on a bag-of-words model, developed from a corpus analysis. 

Yet More Cool Tools:

This page has paths: