The Content Analysis Tool
There are various applications of the content analysis tool such as opinion mining, determining organizational hardiness in stockbroker reports, and sentiment analysis.
Usage
- Upload the documents you wish to explore using Lexos' Upload page, as usual, and scrub the documents if necessary.
- On this Content Analysis page, upload your dictionary file(s).
- Build a formula to relate scores obtained via each dictionary (e.g. [happy] - [sad])
- Click on the Analyze button.
- Review the right-most column in the initial result table to see your "score".
Interpreting the Results
Three groups of tables will be displayed as results:
Assuming a use with happy and sad dictionaries, the first table displays the number of happy and sad terms present in each document, the final formula value computed, the total word count of each document, and the score which is the formula result divided by the total word count. The average for each of these categories is also included in the table.
The second table displays a ranking of the most frequently occurring dictionary terms for the entire corpus (the entire collection of active documents). For each dictionary term, the columns show the dictionary holding the term, the dictionary term, and the raw count of the number of times that term appears in the entire corpus. Each user-defined dictionary is color-coded for convenience.
The third table displays a ranking of the most frequently occurring dictionary terms for each document in the set of active documents. For each dictionary term, the columns show the dictionary holding the term, the dictionary term, and the raw count of the number of times that term appears in the document.
Examples
For step by step examples, visit our public repository on Github WheatonCS/Lexos/Content_Analysis. From here, you may select whichever experiment you wish to carry out. Each folder contains a README.md file with instructions on how to execute the tests, a FilesToUse folder containing all the files you will need, and a ResultsToExpect folder containing a PDF file with the anticipated results from the analysis.Example: Sentiment Analysis
After uploading and preparing one or more text files (e.g., a novel), you must then upload at least one of your own user-defined dictionaries containing keywords that you tend to associate with a particular feeling. This "dictionary" is another text file uploaded under the dictionaries menu in the Content Analysis page. You must then enter a formula using the provided calculator in order to determine the “sentiment” of the text. For instance, you may choose to upload a novel and then two of your own user-defined dictionaries in order to determine the tone of the literary text.
Text File: happy very happy happy very sad sad happy happy happy sad
Dictionary 1: happy, very happy
Dictionary 2: sad, very sad
Formula: [happy] – [sad]
After clicking analyze, you might see that the text appears to have a happier tone since it uses more phrases from Dictionary 1 (the “happy” dictionary) than from Dictionary 2 (the “sad” dictionary).