Show HN: Calx – track and compile corrections humans make with AI agents

3 points

2 months ago

Last year I got laid off and started building a company. Fast forward to a month ago, I built a production system with 6 AI agents across 82,000 lines of code in 20 days for $250. I kept obsessive correction logs. Every time an agent made a mistake and I told it what to do differently, and I made sure it logged the correction itself.

When I transferred 237 of those corrections as rules to a new agent to save time with onboarding in a new repo, it made 44 new mistakes. 13 were in categories the rules explicitly covered. The rules were present in context. The behavior wasn't there. I published the field study with full correction logs.

Then Meta's Superintelligence Labs published HyperAgents (arXiv:2603.19461, March 2026). They found the complementary result: improvements DO transfer across domains when embodied in executable mechanisms (persistent memory, performance tracking, eval loops), not when written as rule text. Two independent studies, same boundary: documentation is not behavior.

So I built Calx. pip install getcalx gives you a CLI + MCP server that:

Captures corrections developers make to AI agents Detects recurrence via keyword similarity (Jaccard), auto-promotes at 3x threshold Promotes recurring corrections to enforced rules and hooks, injected at session start Scopes rules per domain/directory so each agent gets only what's relevant

It runs as a FastMCP server over Streamable HTTP (SQLite locally) so any MCP-compatible client connects: Claude Code, Claude Desktop, Cursor, custom agents. It is primarily designed for Claude Code. It also handles token discipline (prevents context compaction from destroying correction signal), multi-agent orchestration, session lifecycle hooks, orientation gates, and dirty-exit recovery.

The difference from agent memory tools: existing agent memory systems store information for retrieval. Calx tracks the behavioral plane, how an agent works with a specific person, not just what it knows. The data shows the information plane alone doesn't reliably change behavior.

v0.5.0, 443 tests, MIT license. Paper with full evidence: https://doi.org/10.5281/zenodo.19159223

1 comment