In the Margins

The Similarity Query Tool

When you wish to rank the "closeness" between a single document and all other documents in your active set, Similarity Query is a good initial probe. The rankings are determined by "distance between documents", where small distances (near zero) represent documents that are "similar" and "unlike" documents have distances closer to one. Similarity Query, as implemented here, is a variant of Cosine Similarity.

Getting the Results of Similarity Query

  1. On the left, select the radio button for the one document to serve as the comparison document. All other active documents will be compared to this document.
  2. In the panel on the right, you may configure the Advanced Options for manipulating the Document-Term Matrix (DTM). Note: cosine similarity always uses proportions of tokens so no Normalization options are available here.
  3. Select the Get Similarity Rankings button. The results will be shown below in a table, which may be sorted by column by clicking on the column headers. An icon will indicate which column is being used for sorting and whether the sort direction is ascending or descending. (Note: the first click will sort that column in increasing order; click again to sort in decreasing order.) 
  4. The table can be copied to your clipboard or downloaded as an Excel, comma-separated-value (CSV) file, or PDF file. The file with all results will appear in your local Download directory/folder and may be, for instance, opened in a spreadsheet program for further work.

This page has paths: