12016-12-18T10:14:26-08:00Jasmine Drudge-Willson646f888af6780551085f831f746c3fb824afa0d7133191Fitz-Gibbon, Bernice. “Woman in the Gay Flannel Suit” New York Times; Jan 29, 1956.
plain2016-12-18T10:14:26-08:00Jasmine Drudge-Willson646f888af6780551085f831f746c3fb824afa0d7
This page is referenced by:
1media/letters.jpg2016-12-15T09:14:58-08:00Optical Character Recognition11plain2016-12-19T11:36:19-08:00 I used the online OCR software found at www.onlineocr.net to translate the New York Times articles that had been download as pdfs of "image-text" into text-based data that would be readable by Voyant. While the software is technically free, it only allows the user to upload single-page pdfs. In order to upload more than one page, one must create an account. The account allows for the conversion of up to 25 pages in total after which, one must purchase pages with costs ranging from 50 pages for $4.95 to 50 000 pages for $399. 95. Since I needed to use the software for 40 articles, I ended up creating multiple accounts using different e-mail addresses to keep the service free.
While surprisingly effective, this also required a large amount of data cleaning as some words or characters were not read properly, and any ink marks on the pdfs were interpreted by the software as numbers, letters or symbols. I found that older texts with messier ink marks, or more ornate typefaces were much harder for the OCR to interpret than the articles closer to the present day. These issues required me to compare the OCR file with the original article pdf to ensure an exact duplicate of the text which made the process very time consuming, also contributing to the small sample size of articles used in the project.