Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
1.
▲
Show HN: LLM Function Calling Library to Interact with File, Shell, Git and Code
(swekit.dev)
5 points
soham123
2 years ago
discuss
2.
▲
Show HN: New SWE-bench leaderboard compares LMs without fancy agent scaffolds
(swebench.com)
2 points
lieret
10 months ago
discuss
3.
▲
Show HN: Agent Benchmark Repository and Viewer
(explorer.invariantlabs.ai)
2 points
marcfisc
2 years ago
discuss
4.
▲
MiniMax M2.5 is beating Claude Opus 4.6 and MiniMax is 17x-20x cheaper
(swebench.com)
6 points
thelinuxkid
3 months ago
9 comments
5.
▲
Show HN: Randomly switching between LMs at every step boosts SWE-bench score
(swebench.com)
5 points
lieret
9 months ago
1 comment
6.
▲
SWE-bench just published an updated list of top AI Agents
(swebench.com)
4 points
laxyz
a year ago
discuss
7.
▲
Amazon Q Developer Agent is now SOTA on SWE-bench
(swebench.com)
4 points
brendanfalk
2 years ago
discuss
8.
▲
New leader on swe-bench multimodal
(swebench.com)
3 points
katrin777
a year ago
discuss
9.
▲
Refact.ai is the new open-source SOTA on SWE-bench Verified and Lite
(swebench.com)
3 points
bystrakowa
a year ago
discuss
10.
▲
New #1 SOTA on Swe-bench is using Claude 3.7 and O1
(swebench.com)
3 points
knes
a year ago
discuss
11.
▲
SWE-Bench Can Language Models Resolve Real-World GitHub Issues?
(swebench.com)
3 points
EvgeniyZh
3 years ago
discuss
12.
▲
Gru.ai Got 35.67% on SWEbench
(swebench.com)
2 points
BabelCLoud
2 years ago
discuss
13.
▲
SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?
(swebench.com)
2 points
cjsaltlake
3 years ago
discuss
14.
▲
SWE-bench
(swebench.com)
1 point
katrin777
a year ago
discuss
15.
▲
SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?
(swebench.com)
1 point
goranmoomin
2 years ago
discuss
16.
▲
Can Language Models Resolve Real-World GitHub Issues?
(swebench.com)
1 point
throw2321
3 years ago
discuss