Show HN: I built AgentSafety, an open benchmark for coding-agent safety

1 point

3 months ago

I built AgentSafety to test whether autonomous coding agents make the right safety call: allow, ask, or refuse.

v0.1 includes 50 benchmark cases and focuses on practical failure modes like prompt injection, secret access, destructive commands, out-of-workspace writes, dependency installs, and ambiguous intent. It also includes a policy baseline plus reproducible run artifacts and comparison reports.

I’d really value feedback on case quality, labeling/scoring, and what’s missing for real-world agent evaluation.