The following article discusses a fast and accurate BM25 implementation in pure Python. It uses NumPy to do efficient calculations outside the GIL, the arrays package to use minimal memory on stored terms and SQLite/caching to offload less frequently used term frequency arrays to disk.https://neuml.hashnode.dev/building-an-efficient-sparse-keyword-index-in-python