Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
LoCoMo AI Benchmark: 6.4% of answer key wrong, judge accepts 63% of fake answers
github.com/dial481
3 points
dial481
2 months ago
3 comments
Loading...
LoCoMo AI Benchmark: 6.4% of answer key wrong, judge accepts 63% of fake answers | Heykuki News