LLM evals test outputs. Rarely whether the model understood first | Heykuki News