After watching too many agents confidently lie in production, I built Director-AI.
It sits between your LLM and the user, scoring every generated token with: • 0.6× DeBERTa-v3 NLI (contradiction detection) • 0.4× RAG against your own ChromaDB knowledge base
If coherence < threshold → Rust kernel halts the stream before the token is sent.
Key technical bits: • Works with any OpenAI-compatible endpoint (Ollama, vLLM, llama.cpp, Groq, OpenAI, Claude…) • StreamingKernel + windowed scoring • GroundTruthStore.add() for easy fact ingestion • Dual licensing: AGPL open + commercial (closed-source/SaaS OK)
Honest AggreFact numbers inside (66.2% balanced acc with streaming enabled). Not claiming SOTA on static NLI — the value is in the live gating + custom KB system.
Repo + full examples: https://github.com/anulum/director-ai
Would love feedback on the scoring weights, halt logic, or kernel design. What hallucination problems are you solving today?