Arxiv explorer

A small web app project for exploring Arxiv papers

Description

This project was about creating a tool similar to Arxiv Sanity with additional NLP functionalities for finding similar papers from their abstract.

I used a concept from [1], which uses earth mover’s distance metric between documents represented as normalized bag-of-words. The underlying transport cost between two words is given by their distance in a pre-trained word vector space. The app trains word vectors on all Arxiv abstracts and uses the EMD based metric to compute similarities between papers.


References

  1. Kusner, M. J., Sun, Y., Kolkin, N. I. & Weinberger, K. Q.. From Word Embeddings to Document Distances. in Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37 957–966 (JMLR.org, 2015).