Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
1.
Show HN: CATArena – Evaluating LLM agents via dynamic enviroment interactions (github.com/AGI-Eval-Official)
3 points
jinqueeny
5 months ago
discuss
2.
Stop benchmarking LLMs. Make them fight (github.com/AGI-Eval-Official)
2 points
jinqueeny
5 months ago
discuss