In the Margins

The Statistics Tool

The Lexos Statistics tool provides a basic overview of statistical content in your collection, other than the details of the Document-Term Matrix (DTM) provided in Tokenizer.

Statistics generates a table containing the number of distinct terms, the number of terms occurring once (hapax legomena), the total term count, and the average term frequency in each document. You may generate statistics on all of your active files or you may select a subset by using the Select Document(s) checkboxes. All of the Advanced Options for manipulating the Document-Term Matrix (DTM) are available. When you have chosen your settings, click the Generate Statistics button.

Using the Statistics Table

The statistics table may be sorted by column by clicking on the column headers. An icon will indicate which column is being used for sorting and whether the sort direction is ascending or descending. Use the Display dropdown menu to display more than the default 10 rows per page. The statistics table may be copied to your computer's clipboard by clicking the Copy button. It may also be downloaded as an Excel spreadsheet, Comma-Separated Values (CSV) file, Tab-Separated Values (TSV) file, or a PDF.

Statistics for the Entire Corpus

When you generate the statistics table, Lexos also calculates the average and median size of your documents (based on term counts). This information is used to run a Standard Error test to determine if any of the documents is anomalously large or small. If any document falls outside of two standard deviations beyond the average, Lexos provides a warning to let you know if the document is particularly large or small compared to the rest of your corpus.

This page has paths:

  1. Manual Scott Kleinman