Rolling Window - Formal Description
Formal Description of Rolling Window Averages and Rolling Window Ratios
Rolling Averages
To calculate a rolling average, we first select a window size, w, which is significantly smaller than T, the total number of units (e.g. characters, words, or lines) in the document. The first window in the document thus consists of units 1-w. We count the number of features of interest, n, within this first window and divide by the window size, giving us an average of the number of features per units (p = n/w). From this information, we produce a data pair comprised of the ordinal number of the window, k and the value of p (k, pk). So for the first window, the resulting data-pair is (1, p1). We then shift the window one unit towards the end of the text by one unit by incrementing both the initial and final units in the window by 1 (k+1, w+1), tabulate number of times the feature of interest appears in the shifted window, and calculate p2=n2/w, producing a new pair of data-points, (2, p2). This process is repeated, moving the window through the text until the edge of the window meets the end of the text (i.e., where k+w=T), producing a set of k coordinates in the form (k, pk).
Formally, the value of p at any location k is equal to:
\[{{p_{k} = \left\lbrack \left( \sum_{i = k}^{k + w}n_{} \right) \div \left( w \right) \right\rbrack,\ k + w \leq \ T}_{}}_{}\]
where k is the ordinal number of the first unit in the window
w is the size of the window in units
n is the total number of features of interest in the window
T is the total number of units in the text.
Rolling Ratios
To calculate a rolling ratio, we first select a window size, w, which is significantly smaller than T, the total number of units (e.g. characters, words, or lines) in the document. The first window in the document thus consists of units 1-w. If we have two mutually exclusive features of interest, m and r, we can calculate their ratio, q, to each other by dividing the number of appearances of the first feature by the total number of both features in a given window: q = (m/m+r). From this information we produce a data pair comprised of k, the ordinal number of the window and the value of q. For the first window, therefore, the resulting data pair is (k, q1). We then shift the window one unit towards the end of the text by incrementing both the initial and final units in the window by 1, tabulate the number of times the feature of interest appears in the shifted window, and calculate q2=m2 /(r2+m2), producing a new pair of data-points, (2, q2). This process is repeated, moving the window through the text until the leading edge of the window meets the end of the text (i.e., where k+w=T), producing a set of k coordinates in the form (k, qk).
Formally, the value of q at any location k is equal to:
\[{{q_{k} = \sum_{i = k}^{k + w}m_{} \div \left( \sum_{i = k}^{k + w}{m_{} + \sum_{i = k}^{k + w}r_{}} \right),\ k + w \leq \ T}_{}}_{}\]
where k is the ordinal number of the first unit in the window
w is the size of the window in units
m is the total number of the first feature of interest in the window
r is the total number of the second feature of interest in the window
T is the total number of units in the text.
A rolling ratio avoids the problem of a zero in the denominator of a fraction so long as the window is large enough that no segments completely lack either unit.
Window Orientation
Using a forward-looking window, it is not possible to calculate the average or ratio for the final w-1 units of the text, since this span of units is less than w, the size of the window. This span is lost to the analysis. Using a backward-looking window that begins at the end of the text and moves towards the beginning will captures this data, but at the cost of losing the first w-1 units. A centered window would start the window at the (w/2)th point in the document and end at the (T-(w/2))th point. This method leaves unanalyzed segments at both the beginning and the end of the document that are each only half the size (w/2 units long) of those generated by the forward- or backward-looking windows.
Currently, Lexos is only able to produce forward-looking window analyses.