In the Margins

The Rolling Windows Tool

Rolling window analysis is a method of tracing the frequency of terms within a designated window of tokens over the course of a document. It can be used to identify small- and large-scale patterns of usage of individual features or to compare these patterns for multiple features. Rolling window analysis tabulates term frequency as part of a continuously moving metric, rather than in discrete segments. Beginning with the selection of a window, say 100 tokens, rolling window analysis traces the frequency of a term's occurrence first within tokens 1-100, then 2 to 101, then 3, 102, and so on until the end of the document is reached. The result can be plotted as a line graph so that it is possible to observe gradual changes in a token’s frequency as the text progresses. Plotting different tokens on the same graph allows us to compare their frequencies.

The Lexos Rolling Windows Tool performs this analysis. It has numerous options which are best understood as part of a workflow. In the Lexos interface, the steps of this workflow are numbered 1-6. Each of these options steps is discussed below.

  1. Select Active Document: Lexos performs rolling windows analysis on a single active document at a time. Use the radio buttons to select which document you would like to examine.
  2. Select Calculation Type: Lexos will plot either the average term frequency in each window (Rolling Average) or the ratio of term frequencies if you are examining multiple terms (Rolling Ratio).
  3. Enter Search Terms: These are the terms you wish to plot from your document. Enter up to 6 terms, separated by commas. When Lexos searches your document for these terms, it uses the document text, rather than the Document-Term Matrix (DTM) as its starting point. This means that you can choose to search for strings of text, individual words or terms (separated by spaces), or regular expressions (regex). A basic tutorial for using regex can be found at https://regexone.com/.
  4. Define Window: This is where you set the size of the window you want to use. It can consist of any number of characters, tokens (separated by spaces), or lines (separated by line breaks in the text). If your document contains milestones, click the checkbox, and the location of each milestone will be indicated on the rolling window graph by a vertical line.
  5. Choose Display Options: The Hide Individual Points option (turned on by default) produces an uninterrupted line graph, which may be easier to read. Turning this option off will show the points where each term occurs in the document. Mousing over the point will display the location of the term in the token sequence (starting from 0), along with the average or ratio at that point in the window. The Black and White Only option produces a non-color version of the graph that is suitable for downloading and publishing in journals.
  6. Get Graph: Click the Get Graph button to generate the Rolling Windows graph. Once it has been generated, the screen will scroll automatically to the top of the graph. Download buttons will also appear both above and below the graph. You can download the data by clicking the CSV Matrix button. This will give you a comma-separated values (CSV) file, which you can open in a spreadsheet program. To download the image, click either of the SVG buttons as appropriate for your browser. A new tab will open, and you can save it by right-clicking and saving the page.

Additional Graph Interactivity

In addition to mousing over points if you have turned off the Hide Individual Points, you can drag your mouse over portions of the bottom ribbon to magnify sections of the graph.

This page has paths: