Similarity Query
In Similarity Query, Lexos supports the comparison of one file to all other files, the user selecting the target file name and tokenize options. More specificaly, Lexos calls sklearn.metrics.pairwise.cosine_similarity to compute similarity as the normalized dot product of two vectors of counts from the Document Term Matrix (DTM), for example between documents X and Y:
> CosineSimilarity = (X, Y) = <X, Y> / (||X||*||Y||)
with the distance between documents being defined as:
distance = 1 - CosineSimilarity
Tutorial:
After using the tools to scrub and cut files, the users can then use similarity to compare files.
1. First, select one file that you want to use to compare with all the others. Lexos will automatically select the first file in the list.