Show HN: LLM Tree Navigation Benchmark

Heykuki News

6 points

2 years ago

Measures the ability of various LLMs to navigate a fictional codebase via iterative directory tree expansion and observation.

Each model's baseline ability is compared against combinations of various prompt engineering mods to quantify exactly how much they help or hinder the LLM.

Interesting findings here: https://github.com/aiwebb/treenav-bench#interesting-findings

1 comment

Show HN: LLM Tree Navigation Benchmark | Heykuki News