Average compliance score: 2.2 out of 6 articles 97% of files fail Article 9 (Risk Management) 89% fail Article 12 (Record-Keeping) 84% fail Article 14 (Human Oversight) Only 23 out of 5,754 files (0.4%) pass all 6 checks Best scoring repo: AutoGPT at 2.9/6. Worst: CrewAI examples at 1.4/6
What the scanner checks (per article):
Art. 9: risk classification, access control, risk audit Art. 10: input validation, PII handling, data schemas, provenance Art. 11: logging, documentation, type hints Art. 12: structured logging, audit trail, timestamps, log integrity Art. 14: human review, override mechanism, notifications Art. 15: input sanitization, error handling, testing, rate limiting
An article "passes" if at least 1 sub-check is detected. This is generous — real compliance requires substantially more. Caveats I'll save you the trouble of pointing out:
This is static analysis. It can't verify runtime behavior. File-level scanning misses cross-file compliance patterns. The pass threshold is intentionally lenient (1-of-N sub-checks). This checks technical requirements, not legal compliance. It's a linter, not a lawyer.
The EU AI Act enforcement deadline is August 2026. The full report, raw data (JSON), and the scanning scripts are all in the repo.
GitHub: https://github.com/air-blackbox/air-blackbox-mcp Full report: https://github.com/air-blackbox/air-blackbox-mcp/blob/main/b... Install: pip install air-blackbox-mcp Demo: https://huggingface.co/spaces/airblackbox/air-blackbox-scann...
Happy to answer questions about the methodology, the scanner internals, or what we're building next (fine-tuned local LLM for deeper analysis — your code never leaves your machine).