Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
1.
Show HN: LLM Function Calling Library to Interact with File, Shell, Git and Code (swekit.dev)
5 points
soham123
2 years ago
discuss
2.
Show HN: New SWE-bench leaderboard compares LMs without fancy agent scaffolds (swebench.com)
2 points
lieret
10 months ago
discuss
3.
Show HN: Agent Benchmark Repository and Viewer (explorer.invariantlabs.ai)
2 points
marcfisc
2 years ago
discuss
4.
MiniMax M2.5 is beating Claude Opus 4.6 and MiniMax is 17x-20x cheaper (swebench.com)
6 points
thelinuxkid
3 months ago
9 comments
5.
Show HN: Randomly switching between LMs at every step boosts SWE-bench score (swebench.com)
5 points
lieret
9 months ago
1 comment
6.
SWE-bench just published an updated list of top AI Agents (swebench.com)
4 points
laxyz
a year ago
discuss
7.
Amazon Q Developer Agent is now SOTA on SWE-bench (swebench.com)
4 points
brendanfalk
2 years ago
discuss
8.
New leader on swe-bench multimodal (swebench.com)
3 points
katrin777
a year ago
discuss
9.
Refact.ai is the new open-source SOTA on SWE-bench Verified and Lite (swebench.com)
3 points
bystrakowa
a year ago
discuss
10.
New #1 SOTA on Swe-bench is using Claude 3.7 and O1 (swebench.com)
3 points
knes
a year ago
discuss
11.
SWE-Bench Can Language Models Resolve Real-World GitHub Issues? (swebench.com)
3 points
EvgeniyZh
3 years ago
discuss
12.
Gru.ai Got 35.67% on SWEbench (swebench.com)
2 points
BabelCLoud
2 years ago
discuss
13.
SWE-Bench: Can Language Models Resolve Real-World GitHub Issues? (swebench.com)
2 points
cjsaltlake
3 years ago
discuss
14.
SWE-bench (swebench.com)
1 point
katrin777
a year ago
discuss
15.
SWE-Bench: Can Language Models Resolve Real-World GitHub Issues? (swebench.com)
1 point
goranmoomin
2 years ago
discuss
16.
Can Language Models Resolve Real-World GitHub Issues? (swebench.com)
1 point
throw2321
3 years ago
discuss