In the Margins

The Scrub Tool

The Scrubbing tool allows you to make document-wide edits, such as the removal of punctuation, digits, and capital letters as well as how to handle hyphens and possessive apostrophes. Additionally, Scrub allows you to input a list of stopwords, lemmas, consolidations (character replacements), and special characters, each of which will be explained in more detail below. 

Using Lexos
The first part of the scrubbing process involves several simple options which effect the entire document. In almost all cases, their names provide ample evidence of their functionality, so the important thing to remember is that they will take effect in each of your selected (active) documents or document segments.

Remove punctuation:
All unicode characters have an associated set of metadata for classifying its "type" of character. If this option is selected, any unicode character in each of the active texts that has a Punctuation Character Property (begins with a 'P') or a Symbol Character Property (begins with 'S') is removed. The specific Punctuation and Symbols that are removed are listed below:
 
Punctuation 
PcConnector punctuation 
PdDash punctuation 
PeClose punctuation 
PfFinal punctuation 
PiInitial punctuation 
PoOther punctuation 
PsOpen punctuation 
SSymbol 
ScCurrency symbol 
SkModifier symbol 
SmMathematical symbol 
SoOther symbol
If Remove Punctuation is selected, two additional options are presented:


Make lowercase:
 

Contents of this path:

  1. Lemmas