Description
This project was about creating a tool similar to Arxiv Sanity with additional NLP functionalities for finding similar papers from their abstract.
I used a concept from [1], which uses earth mover’s distance metric between documents represented as normalized bag-of-words. The underlying transport cost between two words is given by their distance in a pre-trained word vector space. The app trains word vectors on all Arxiv abstracts and uses the EMD based metric to compute similarities between papers.