Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
1.
Top model scores may be skewed by Git history leaks in SWE-bench (github.com/SWE-bench)
466 points
mustaphah
9 months ago
153 comments
2.
SWE-Bench Pro (github.com/scaleapi)
101 points
tosh
8 months ago
28 comments
3.
SWE-bench verified agents may look at future repository state (github.com/SWE-bench)
4 points
brrrrrm
9 months ago
discuss
4.
I made a viewer for the SWE-Bench dataset (github.com/mwufi)
1 point
randomcatuser
2 years ago
discuss
5.
Show HN: Codex context bloat? 87% avg reduction on SWE-bench Verified traces (npmjs.com)
10 points
george_ciobanu
a month ago
2 comments
6.
Show HN: Mini-swe-agent achieves 65% on SWE-bench in 100 lines of python (github.com/SWE-agent)
7 points
lieret
10 months ago
4 comments
7.
Show HN: New eval from SWE-bench team evalutes LMs based on goals not tickets (codeclash.ai)
5 points
lieret
7 months ago
1 comment
8.
Show HN: Mcpbr – does your MCP help? Test it on SWE-bench and 25 evals (github.com/greynewell)
4 points
greynewell
4 months ago
discuss
9.
SWE-gen: Scaling SWE-bench task generation (github.com/abundant-ai)
4 points
coffeecoder123
4 months ago
discuss
10.
SWE-Bench for Taxes (github.com/column-tax)
3 points
michaelrbock
10 months ago
discuss
11.
Show HN: Loki Mode hit 99.67% SWE-Bench – MAF built a SaaS overnight (github.com/asklokesh)
2 points
slogansand
5 months ago
5 comments
12.
talkie-coder: From 1930 to SWE-bench (github.com/RicardoDominguez)
2 points
Philpax
a month ago
discuss
13.
Some critical issues with the SWE-bench-Pro environments (github.com/SWE-agent)
2 points
snoopyswe
2 months ago
discuss
14.
Show HN: New SWE-bench leaderboard compares LMs without fancy agent scaffolds (swebench.com)
2 points
lieret
10 months ago
discuss
15.
Show HN: Sales Agent Benchmark – SWE-Bench for sales AI agents (open source) (sales-agent-benchmarks.fly.dev)
1 point
a1j9o94
4 months ago
discuss
16.
Show HN: Statewright – Visual state machines that make AI agents reliable (github.com/statewright)
126 points
azurewraith
23 days ago
59 comments
17.
Show HN: AgentKit – JavaScript Alternative to OpenAI Agents SDK with Native MCP (github.com/inngest)
64 points
tonyhb
a year ago
15 comments
18.
Show HN: Anterion – Open-source AI software engineer (SWE-agent and OpenDevin) (github.com/MiscellaneousStuff)
4 points
miscstuffz
2 years ago
2 comments
19.
Show HN: Qwen3-Coder API – 480B open-source code LLM (netmind.ai)
3 points
elricwan
10 months ago
1 comment
20.
Show HN: Gemini 2.5 is the best model for Kotlin and Android dev (firebender.com)
3 points
aman-firebender
a year ago
discuss
21.
Show HN: Tarmac – Know what Claude Code will cost before you run it (github.com/CodeSarthak)
2 points
sarthakaggarwal
3 months ago
1 comment
22.
Show HN: Measuring how AI agent teams improve issue resolution on SWE-Verified (arxiv.org)
2 points
NBenkovich
4 months ago
discuss
23.
Show HN: Salacia – The First Runtime OS for Agentic Coding
1 point
alfredhua
3 months ago
1 comment
24.
Show HN: Repowise – Codebase intelligence for AI coding agents (open source) (github.com/repowise-dev)
1 point
raghavchamadiya
2 months ago
discuss
25.
Show HN: Tracecore: Benchmark AI Agents on Deterministic Coding Tasks (github.com/justindobbs)
1 point
extra_cookin
3 months ago
discuss
26.
Show HN: Frouter – Live-ping and auto-configure free AI models for coding agents (github.com/jyoung105)
1 point
jyoung105
3 months ago
discuss