What kept bothering me with AI features in production wasn’t really how to build them, but everything that comes after: explaining why something happened, reproducing it weeks later, or changing prompts/models without breaking things in subtle ways.
Logs helped a bit, but not enough. Agent frameworks felt too implicit for my taste. And model upgrades were honestly scary, outputs would change and it wasn’t always obvious where or why.
So I ended up building a very small, explicit kernel where each AI step can be replayed, diffed, and reviewed. Think something like Git-style workflows for AI decisions, but without trying to be a framework or a runtime.
It’s not an agent framework, not a chat UI, and not a platform, just a TypeScript library focused on explicit state, audit events, and replay + diff.
Repo: https://github.com/verist-ai/verist
I’m especially curious if others here have run into similar issues shipping AI features to prod, or if this feels like overkill. Happy to answer questions or hear criticism.