In the Margins

Pre-Processing

Pre-processing topics go in this path. This page should contain general discussion of pre-processing workflow.

This paragraph is a reminder to add somewhere the advice that you should always examine your token list in Tokenize. It is possible that your text has been improperly split into tokens. A particularly common problem is merged words. There should be discussion of this somewhere.

This page has paths:

  1. Topics Scott Kleinman

Contents of this path:

  1. Scrubbing
  2. Cutting
  3. Tokenisation