Show HN: Playground for comparing embedding models on Wikipedia+book retrieval

5 points

3 years ago

Introducing embeds.ai: an embedding playground to compare how embedding models work on a real world use case (retrieval augmented generation for Wikipedia articles + Elad Gil's High growth handbook)

A few weeks ago, Shreyan and I were looking for an embedding model to use for RAG. We eventually came across the MTEB leaderboard, but we struggled to understand the benchmark scores.

We wanted a tool to test various embedding models with example queries on real-world datasets. After unsuccessfully looking for such a “playground”, we decided to just build one ourselves!

We embedded HuggingFace’s Simple Wikipedia dataset using @OpenAI, @Cohere, and 2 open-source models via @Baseten. We then stored the embeddings in @Supabase using pgvector. Finally, we built a web app using NextJS and deployed it on @Vercel.

Now we’re hosting the playground for anyone to use for free, as well as open-sourcing our work so people can try evaluating other models, datasets, or indexes.

Learn more here in our full blog post here: https://shreyanjain.substack.com/p/announcing-embedding-batt...

And the repo is here: https://github.com/EGCap/playground

If you have other suggestions / pain points from working with embedding models, vector DBs, or RAG, or if you would like to collaborate on any of the above or unrelated projects, please reach out! @shreyanj98 @davidtsong on Twitter

11 comments