All implementation details are in the GitHub. Currently, because I've opted to extract text from the PDF of the paper rather than reading the LaTeX source code (since I wanted to build a more generic PDF RAG in the process), it struggles with symbolic text / mathematics, and sometimes fails to retrieve the correct context. I appreciate any feedback, and hope people find it useful!
Currently, the backend PDF processing server is only single-threaded so if embedding takes a while please be patient!