We’ve been analyzing shipping documents with LLMs for over a year with Cube. While building, we faced two major challenges in document analysis: First, each client had different document formats (PDFs, Excel sheets, emails) requiring custom parsers. Second, getting consistent, structured outputs from LLMs was a constant struggle - small prompt changes would lead to unpredictable results. After months of building parsers and refining prompts, we realized everyone working with LLMs faces these same challenges, so we built UiForm as an open solution that handles both the document processing and prompt engineering pieces in one cohesive system.
Today:
1. We’re launching uiform (free for all!), an API that pre-processes any file (e.g. excel, email, …) for use with LLMs. We built it to be compatible with Pydantic, JSON schemas, and most LLM providers.
2. We're open sourcing a prompt engineering framework that combines JSON schema validation with Chain-of-Thought reasoning to ensure reliable structured outputs
Prompt engineering is managed directly within the JSON schema using three additional directives:
- X-SystemPrompt
- X-FieldPrompt enhances the standard field description to decorrelate prompt engineering and schema specification
- X-ReasoningPrompt creates an auxiliary reasoning field that gives the LLM more time to think, to perform better when dealing with complex data.
Since o1, everybody's been talking about CoT and inference time compute. We found that using reasoning fields with structured generation improves performance on document analysis tasks.
We’d love to see document analysis be more community driven, with people sharing their JSON-schemas for different use-cases, which is why we open-sourced our prompt-engineering utility :)
Looking forward to hearing your thoughts, we’ll be in the comments or on discord (https://discord.com/invite/vc5tWRPqag). Thanks!