Benchmark proves cultural grounding: Triad 100% vs Claude 4.6 45% on 222q anachronism test.
Public: eval framework + 20 sample questions Gated: full research dataset (airtrek.ai/research)
Cultural intelligence that frontier models fail.
Feedback welcome!