This site requires Javascript to be turned on. Please enable Javascript and reload the page.

In the Margins Main Menu Welcome The In the Margins home page Lexomics The starting point for the Lexomics path Manual Start page for the Lexos Manual Topics Explore this path to learn about the Lexomic methods Glossary Glossary of terms used in Lexos and In the Margins Bibliography Beginning of bibliography path Lexos Install Guide Install Guide

Rolling Window - Formal Description

5371

Formal Description of Rolling Window Averages and Rolling Window Ratios

Rolling Averages

To calculate a rolling average, we first select a window size, w, which is significantly smaller than T, the total number of units (e.g. characters, words, or lines) in the document. The first window in the document thus consists of units 1-w. We count the number of features of interest, n, within this first window and divide by the window size, giving us an average of the number of features per units (p = n/w). From this information, we produce a data pair comprised of the ordinal number of the window, k and the value of p (k, p_k). So for the first window, the resulting data-pair is (1, p₁). We then shift the window one unit towards the end of the text by one unit by incrementing both the initial and final units in the window by 1 (k+1, w+1), tabulate number of times the feature of interest appears in the shifted window, and calculate p₂=n₂/w, producing a new pair of data-points, (2, p₂). This process is repeated, moving the window through the text until the edge of the window meets the end of the text (i.e., where k+w=T), producing a set of k coordinates in the form (k, p_k).

Formally, the value of p at any location k is equal to:

\[{{p_{k} = \left\lbrack \left( \sum_{i = k}^{k + w}n_{} \right) \div \left( w \right) \right\rbrack,\ k + w \leq \ T}_{}}_{}\]

where k is the ordinal number of the first unit in the window

w is the size of the window in units

n is the total number of features of interest in the window

T is the total number of units in the text.

Rolling Ratios

To calculate a rolling ratio, we first select a window size, w, which is significantly smaller than T, the total number of units (e.g. characters, words, or lines) in the document. The first window in the document thus consists of units 1-w. If we have two mutually exclusive features of interest, m and r, we can calculate their ratio, q, to each other by dividing the number of appearances of the first feature by the total number of both features in a given window: q = (m/m+r). From this information we produce a data pair comprised of k, the ordinal number of the window and the value of q. For the first window, therefore, the resulting data pair is (k, q₁). We then shift the window one unit towards the end of the text by incrementing both the initial and final units in the window by 1, tabulate the number of times the feature of interest appears in the shifted window, and calculate q₂=m₂ /(r₂₊m₂), producing a new pair of data-points, (2, q₂). This process is repeated, moving the window through the text until the leading edge of the window meets the end of the text (i.e., where k+w=T), producing a set of k coordinates in the form (k, q_k).

Formally, the value of q at any location k is equal to:

\[{{q_{k} = \sum_{i = k}^{k + w}m_{} \div \left( \sum_{i = k}^{k + w}{m_{} + \sum_{i = k}^{k + w}r_{}} \right),\ k + w \leq \ T}_{}}_{}\]

where k is the ordinal number of the first unit in the window

w is the size of the window in units

m is the total number of the first feature of interest in the window

r is the total number of the second feature of interest in the window

T is the total number of units in the text.

A rolling ratio avoids the problem of a zero in the denominator of a fraction so long as the window is large enough that no segments completely lack either unit.

Window Orientation

Using a forward-looking window, it is not possible to calculate the average or ratio for the final w-1 units of the text, since this span of units is less than w, the size of the window. This span is lost to the analysis. Using a backward-looking window that begins at the end of the text and moves towards the beginning will captures this data, but at the cost of losing the first w-1 units. A centered window would start the window at the (w/2)^th point in the document and end at the (T-(w/2))^th point. This method leaves unanalyzed segments at both the beginning and the end of the document that are each only half the size (w/2 units long) of those generated by the forward- or backward-looking windows.

Currently, Lexos is only able to produce forward-looking window analyses.