This page was created by Karen Hallett.  The last update was by Andrea Davis.

Workbook for Introduction to Digital Humanities: A-State

Anonymous's Text Analysis






For this I thought it would be interesting to compare the US Budget and the Economic Report of the President. The first image is just the budget, and as one would expect, Budget is the most used word, with Programs second.  For the Economic report of the president, percent, https (obviously there are lots of links in this document) U.S., and Tax are the most frequent words.  I love that Voyant shows us in what document sections these words are most used. Obviously, in the ERP, the URLs are in the latter third of the document.

The last image is an example of how one can create a corpus putting in multiple documents and find key words and other information.  In this image, budget drops to 2nd place with percent being highest.  Quick calculations show that the word percent is only used 75 times in the budget.  We also see that URLs are only present in the budget 3 times.  Tax is only used 64 times in the budget.  By including all three images, quite a bit can be found out about the documents. While these two documents are vastly different it might be interesting to see how the language differs in two documents of similar type.

This could be a very interesting tool to use to see how budget language or economic reports might change over time as to what is important and what is minimized.

This page has paths: