Show HN: Benchmarking LLM Agents on Consequential Real World Tasks

Heykuki News

3 points

a year ago

A benchmark that you could run locally to test out LLM & AI agents' abilities to do real-world tasks

Show HN: Benchmarking LLM Agents on Consequential Real World Tasks | Heykuki News