Show HN: Sycamore – an LLM-powered semantic data preparation system for search

18 points

3 years ago

We’re Aryn and yesterday we came out of stealth: blog.aryn.ai. As part of that, we released Sycamore: https://github.com/aryn-ai/sycamore.

Sycamore is an LLM-powered semantic data preparation system for building search applications. It introduces a distributed set-based abstraction, a DocSet, that makes processing a large document collection as easy as reading a single document. Sycamore makes it easy to use LLMs to transform and enrich your unstructured data and prepare it for search. It comes with a scalable distributed runtime, built on Ray, that makes it easy to go from prototype to production.

For example, with Sycamore, you can read a collection of PDFs, partition them in coherent chunks, pull out entities like titles and authors, compute vector embeddings, and load them into a local OpenSearch cluster. All with a few lines of code.

To learn more, visit the repo: https://github.com/aryn-ai/sycamore, docs: https://sycamore.readthedocs.io/, and demo: https://www.loom.com/share/53e68b0eb5ab49948111a3fcf6286b7f?...

We’d love for you to try it out, give us feedback, and contribute.

1 comment