Voyant
Cirrus presents word frequency in the form of a colourful word cloud, the larger words being the most frequent. Clicking any word in the word cloud allows you to see the change of frequency over time in the word Trends graph. It is important to note that the X-axis of this feature is not able to present accurate year markers, but know that each decade is placed in order, with a relatively similar weight to the rest. Thus the timeline on The New York Times Project page begins at 1940, with the middle hovering between the late 70's/early 80's, and ending with 2016. In the Reader window, we find the entirety of the corpus, if you care to read through all of the information. Terms found within the Reader window can also be clicked on if you wish to see their frequency. If you prefer a smaller frame of reference, Contexts lists every point at which the term is found, providing the five words preceding and following the term in question. At the bottom of every tool, there is also the option to search for specific words which may not have been common enough to While I found these tools were the most informative for my research interests, there are many other forms of text-mining presented in Voyant which are worth exploring.
I began experimenting with the manipulation of data in Voyant, first uploading URLs from pages of articles tagged by the New York Times as pertaining to Women and Girls. These pages featured lists of articles, showing their date of publication, author, title and a brief description of the contents within the article, but not include the body of the texts themselves. My hopes that the URLs would upload the entirety of text to simplify the process of text-mining and uploading of data proved unsuccessful. Instead I had to upload the text of each article individually. Uploading articles then required me to copy and paste them into a word document one by one, and then clean the data by removing the textual components of any advertisements that had been included on the page. This limited the scope of my data set, in which I had originally hoped to include 100 articles from each decade, scaling it down to the more modest number of 10 articles per decade to work within the time constraints. Searching for New York Times articles through ProQuest allowed me to access articles dating back to 1940, however any that had been published before 1980 were only accessible in pdf-a format unreadable by Voyant. This required the use of optical character recognition software (OCR), "the recognition of printed or written text characters by a computer. This involves photoscanning of the text character-by-character, analysis of the scanned-in image, and then translation of the character image into character codes" (http://searchcontentmanagement.techtarget.com/).