It lets you not only submit text strings and get back the embeddings, but also to compare two strings and get back their similarity score (i.e., the cosine similarity of their embedding vectors). You can also upload a plaintext file or PDF and get back all the embeddings for every sentence in the file as a zipped JSON file (and you can specify the layout of this JSON file).
Each time an embedding is computed for a given string with a given LLM, that vector is stored in the SQlite database and can be returned immediately. You can also search across all stored vectors easily using a query string; this uses FAISS which is integrated.
There are lots of nice performance enhancements, including parallel inference, db write queue, fully async everything, and even a RAM Disk feature to speed up model loading.
I’m working now on adding additional API endpoints for easily generating sentiment scores using presets for different focus areas, but that’s still work-in-progress (the code for this so far is in the repo though).