1. Recent advancements in LLMs make them incredibly powerful for extracting meaning from unstructured documents. We wanted to utilize these different LLMs through a single library.
2. Static (a.k.a. non-LLM-based) parsing libraries like PyMuPDF work fast and accurately for simple documents, but often their licenses aren’t as permissive as we’d like.
3. We wanted a single library that could integrate both static and LLM-based approaches in a seamless and efficient way.
Key Features:
- Unified Parsing: Use either static parsers or LLM-based ones, depending on your needs.
- Auto-routing Mode: Pages are analyzed independently—faster static parsers handle straightforward pages, while LLMs process pages requiring deeper understanding.
Why we think it’s useful:
Static parsers are fast but struggle with complex, unstructured data. LLMs, on the other hand, offer incredible accuracy but can be expensive and slow. Lexoid lets you use the best tool for the job — automatically if needed.