In the Margins

The Content Analysis Tool

The Lexos Content Analysis Tool allows you to compare the presence of words in multiple dictionaries (lists of words) as found within your uploaded documents. The words in each dictionary are tallied for each document. You can then build your own equation to compute a single score for each document based on the words found in each dictionary. For example, if you have previously uploaded a document containing a concatenated list of tweets, you can use this tool to upload two dictionaries (e.g. happyWords.txt and sadWords.txt), then build an equation such as [happy] - [sad]. Given this equation, the tool will compute the total number of happy words found minus the total number of sad words found for each document. Once you click the Analyze button, a final score will emerge to indicate a "happiness" measure for your collection of tweets.

There are various applications of the content analysis tool such as opinion mining, determining organizational hardiness in stockbroker reports, and sentiment analysis. 

Usage

  1. Upload the documents you wish to explore using Lexos' Upload page, as usual, and scrub the documents if necessary. 
  2. On this Content Analysis page, upload your dictionary file(s).
  3. Build a formula to relate scores obtained via each dictionary (e.g. [happy] - [sad])
  4. Click on the Analyze button.
  5. Review the right-most column in the initial result table to see your "score".

Interpreting the Results

Three groups of tables will be displayed as results:

Assuming a use with happy and sad dictionaries, the first table displays the number of happy and sad terms present in each document, the final formula value computed, the total word count of each document, and the score which is the formula result divided by the total word count. The average for each of these categories is also included in the table.

The second table displays a ranking of the most frequently occurring dictionary terms for the entire corpus (the entire collection of active documents). For each dictionary term, the columns show the dictionary holding the term, the dictionary term, and the raw count of the number of times that term appears in the entire corpus. Each user-defined dictionary is color-coded for convenience.

The third table displays a ranking of the most frequently occurring dictionary terms for each document in the set of active documents. For each dictionary term, the columns show the dictionary holding the term, the dictionary term, and the raw count of the number of times that term appears in the document.

Examples

For step by step examples, visit our public repository on Github WheatonCS/Lexos/Content_Analysis. From here, you may select whichever experiment you wish to carry out. Each folder contains a README.md file with instructions on how to execute the tests, a FilesToUse folder containing all the files you will need, and a ResultsToExpect folder containing a PDF file with the anticipated results from the analysis.

Example: Sentiment Analysis
After uploading and preparing one or more text files (e.g., a novel), you must then upload at least one of your own user-defined dictionaries containing keywords that you tend to associate with a particular feeling. This "dictionary" is another text file uploaded under the dictionaries menu in the Content Analysis page. You must then enter a formula using the provided calculator in order to determine the “sentiment” of the text. For instance, you may choose to upload a novel and then two of your own user-defined dictionaries in order to determine the tone of the literary text.
Text File: happy very happy happy very sad sad happy happy happy sad
Dictionary 1: happy, very happy
Dictionary 2: sad, very sad
Formula: [happy] – [sad]
After clicking analyze, you might see that the text appears to have a happier tone since it uses more phrases from Dictionary 1 (the “happy” dictionary) than from Dictionary 2 (the “sad” dictionary).
 

This page has paths: