Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
751.
Show HN: CivBench a long-horizon AI benchmark for multi-agent games (clashai.live)
12 points
mbh159
3 months ago
24 comments
752.
Open-source LLM cascading, up to 92% cost savings on benchmarks (github.com/lemony-ai)
12 points
saschabuehrle
6 months ago
9 comments
753.
An honest analysis of SpacetimeDB 2.0's insane benchmark results (gist.github.com)
12 points
brandonpollack2
3 months ago
3 comments
754.
Show HN: A benchmark + latency sim for LLM db queries: ClickHouse / Postgres (github.com/514-labs)
12 points
oatsandsugar
10 months ago
3 comments
755.
Yahoo Cloud Serving Benchmark (wiki.github.com)
12 points
helwr
16 years ago
discuss
756.
Google/fuzzbench: Fuzzer benchmarking as a service (github.com/google)
11 points
edward
6 years ago
discuss
757.
A benchmark to compare synchronization techniques for multicore programming (github.com/gramoli)
11 points
wsmith
10 years ago
discuss
758.
HTTP benchmarking tool written in Crystal (github.com/Sdogruyol)
11 points
sdogruyol
11 years ago
discuss
759.
Show HN: Codex context bloat? 87% avg reduction on SWE-bench Verified traces (npmjs.com)
10 points
george_ciobanu
a month ago
2 comments
760.
Show HN: LLM Debate Benchmark (github.com/lechmazur)
9 points
zone411
2 months ago
3 comments
761.
Recursive grep written in Go benched against a C++ and Rust variant (github.com/bep)
9 points
bjornerik
a month ago
2 comments
762.
LLM Persuasion Benchmark: Multi-Turn Persuasion Between Models (github.com/lechmazur)
9 points
zone411
2 months ago
discuss
763.
You Do Not Need a Vector Database (For RAG): Benchmarking IR Methods
9 points
ylow
3 years ago
discuss
764.
Trival PHP string concatenation benchmarks, proving time better spent elsewhere. (github.com/magnetikonline)
8 points
magnetikonline
12 years ago
6 comments
765.
Real-world benchmarks (gist.github.com)
8 points
geelen
13 years ago
2 comments
766.
Show HN: Bazaar – a new LLM benchmark for economic reasoning under uncertainty (github.com/lechmazur)
8 points
zone411
a year ago
1 comment
767.
OpenChat_8192 Beats ChatGPT-3.5 on Vicuna GPT-4 Benchmark
8 points
thibo_skabgia
3 years ago
1 comment
768.
A bunch of JavaScript idiosyncrasies to beginners (github.com/odykyi)
8 points
alexdykyi
8 years ago
1 comment
769.
Raspberry Pi httpd micro benchmark (gist.github.com)
8 points
mpg123
10 years ago
1 comment
770.
Show HN: LLM Creative Story‑Writing Benchmark V3 (github.com/lechmazur)
8 points
zone411
9 months ago
discuss
771.
Show HN: LLM Divergent Thinking Creativity Benchmark (github.com/lechmazur)
8 points
zone411
a year ago
discuss
772.
Show HN: Iron Cushion, a CouchDB benchmark and load testing tool (github.com/mgp)
8 points
shadowmatter
14 years ago
discuss
773.
Show HN: Mini-swe-agent achieves 65% on SWE-bench in 100 lines of python (github.com/SWE-agent)
7 points
lieret
10 months ago
4 comments
774.
A caffeine driven, simplistic approach to benchmarking Node.js code. (github.com/logicalparadox)
7 points
vesln
14 years ago
3 comments
775.
Multi-Agent Step Race Benchmark: LLM Collaboration and Deception Under Pressure (github.com/lechmazur)
7 points
zone411
a year ago
1 comment
776.
Show HN: LLM Deceptiveness and Gullibility Benchmark (github.com/lechmazur)
7 points
zone411
2 years ago
1 comment
777.
Engulf, A Graphical HTTP Benchmarker written in Clojure + D3.js (github.com/andrewvc)
7 points
andrewvc
14 years ago
1 comment
778.
Wrk – an HTTP benchmarking tool (github.com/wg)
7 points
jnazario
13 years ago
discuss
779.
Show HN: Get a report on your compliance to CIS Benchmarks (Azure and AWS) (github.com/4urcloud)
7 points
adrien4urcloud
2 years ago
discuss
780.
Show HN: Ben, your benchmarking assistant, written in Go (github.com/drish)
7 points
drish
8 years ago
discuss
More