Natural language benchmarks don’t measure AI models’ general knowledge well | Heykuki News