Search: github.com/SWE-bench | Heykuki News

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

1.

Top model scores may be skewed by Git history leaks in SWE-bench (github.com/SWE-bench)

466 points

9 months ago

2.

SWE-Bench Pro (github.com/scaleapi)

101 points

8 months ago

3.

SWE-bench verified agents may look at future repository state (github.com/SWE-bench)

4 points

9 months ago

4.

I made a viewer for the SWE-Bench dataset (github.com/mwufi)

1 point

2 years ago

5.

Show HN: Codex context bloat? 87% avg reduction on SWE-bench Verified traces (npmjs.com)

10 points

a month ago

6.

Show HN: Mini-swe-agent achieves 65% on SWE-bench in 100 lines of python (github.com/SWE-agent)

7 points

10 months ago

7.

Show HN: New eval from SWE-bench team evalutes LMs based on goals not tickets (codeclash.ai)

5 points

7 months ago

8.

Show HN: Mcpbr – does your MCP help? Test it on SWE-bench and 25 evals (github.com/greynewell)

4 points

4 months ago

9.

SWE-gen: Scaling SWE-bench task generation (github.com/abundant-ai)

4 points

4 months ago

10.

SWE-Bench for Taxes (github.com/column-tax)

3 points

10 months ago

11.

Show HN: Loki Mode hit 99.67% SWE-Bench – MAF built a SaaS overnight (github.com/asklokesh)

2 points

5 months ago

12.

talkie-coder: From 1930 to SWE-bench (github.com/RicardoDominguez)

2 points

a month ago

13.

Some critical issues with the SWE-bench-Pro environments (github.com/SWE-agent)

2 points

2 months ago

14.

Show HN: New SWE-bench leaderboard compares LMs without fancy agent scaffolds (swebench.com)

2 points

10 months ago

15.

Show HN: Sales Agent Benchmark – SWE-Bench for sales AI agents (open source) (sales-agent-benchmarks.fly.dev)

1 point

4 months ago

16.

Show HN: Statewright – Visual state machines that make AI agents reliable (github.com/statewright)

126 points

23 days ago

17.

Show HN: AgentKit – JavaScript Alternative to OpenAI Agents SDK with Native MCP (github.com/inngest)

64 points

a year ago

18.

Show HN: Anterion – Open-source AI software engineer (SWE-agent and OpenDevin) (github.com/MiscellaneousStuff)

4 points

2 years ago

19.

Show HN: Qwen3-Coder API – 480B open-source code LLM (netmind.ai)

3 points

10 months ago

20.

Show HN: Gemini 2.5 is the best model for Kotlin and Android dev (firebender.com)

3 points

aman-firebender

a year ago

21.

Show HN: Tarmac – Know what Claude Code will cost before you run it (github.com/CodeSarthak)

2 points

sarthakaggarwal

3 months ago

22.

Show HN: Measuring how AI agent teams improve issue resolution on SWE-Verified (arxiv.org)

2 points

4 months ago

23.

Show HN: Salacia – The First Runtime OS for Agentic Coding

1 point

3 months ago

24.

Show HN: Repowise – Codebase intelligence for AI coding agents (open source) (github.com/repowise-dev)

1 point

raghavchamadiya

2 months ago

25.

Show HN: Tracecore: Benchmark AI Agents on Deterministic Coding Tasks (github.com/justindobbs)

1 point

3 months ago

26.

Show HN: Frouter – Live-ping and auto-configure free AI models for coding agents (github.com/jyoung105)

1 point

3 months ago