Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
961.
▲
An Nginx Engineer Took over AI's Benchmark Tool
(github.com/hongzhidao)
2 points
zhidao9
4 months ago
discuss
962.
▲
KiteSQL: Rust-native embedded SQL with TPC-C benchmarks and WASM support
(github.com/KipData)
2 points
Jacques2Marais
4 months ago
discuss
963.
▲
WorkBench-Pro – PC benchmark designed for developer workflows
(github.com/johanmcad)
2 points
johanmcad
4 months ago
discuss
964.
▲
Benchmark Comparison: JSONL vs. TOON output for JSON-render efficiency
(github.com/vercel-labs)
2 points
lafalce
5 months ago
discuss
965.
▲
Show HN: Rerankers – Models, benchmarks, and papers for RAG
(github.com/agentset-ai)
2 points
midamurat
5 months ago
discuss
966.
▲
Show HN: sc-membench for modern memory bandwidth and latency benchmarks
(github.com/spareCores)
2 points
daroczig
5 months ago
discuss
967.
▲
Show HN: Long-horizon LLM coherence benchmark (500 cycles)
(zenodo.org)
2 points
teugent
5 months ago
discuss
968.
▲
Epiplexity to Beat DeepMind's Alchemy Meta RL Benchmark
(github.com/RandMan444)
2 points
Phillip98798
5 months ago
discuss
969.
▲
Show HN: JSONBench, a Benchmark for Data Analytics on JSON
(github.com/ClickHouse)
2 points
saisrirampur
5 months ago
discuss
970.
▲
Stop benchmarking LLMs. Make them fight
(github.com/AGI-Eval-Official)
2 points
jinqueeny
5 months ago
discuss
971.
▲
Show HN: Sigma Runtime – 550-cycle identity stability benchmark on GPT-5.2
(github.com/sigmastratum)
2 points
teugent
6 months ago
discuss
972.
▲
Benchmarking LLMs on whether they can play FizzBuzz
(github.com/venkatasg)
2 points
_venkatasg
6 months ago
discuss
973.
▲
Running a 270M LLM on Android (architecture and benchmarks)
2 points
ayushranjan99
7 months ago
discuss
974.
▲
TypeNet Benchmark for development of authentication keystroke technologies
(github.com/BiDAlab)
2 points
mooreds
8 months ago
discuss
975.
▲
AutoCodeBench: Large Language Models Are Automatic Code Benchmark Generators
(github.com/Tencent-Hunyuan)
2 points
ngrilly
9 months ago
discuss
976.
▲
Show HN: Little Fluffy Clouds: Combine a bunch of small adjacent networks
(github.com/kstrauser)
2 points
kstrauser
9 months ago
discuss
977.
▲
Behavior: Robot manipulation benchmark based on 1000 household tasks
(github.com/StanfordVL)
2 points
transpute
9 months ago
discuss
978.
▲
Show HN: LLM‑Simple‑Eval – Easily Benchmark LLMs for Your Use Case
(github.com/grigio)
2 points
grigio
9 months ago
discuss
979.
▲
PostgreSQL vs. ClickHouse: Learnings from building my first database benchmark
(github.com/514-labs)
2 points
oatsandsugar
10 months ago
discuss
980.
▲
Show HN: New SWE-bench leaderboard compares LMs without fancy agent scaffolds
(swebench.com)
2 points
lieret
10 months ago
discuss
981.
▲
Show HN: VDBbench 1.0: open-source benchmarking for VectorDBs
(github.com/zilliztech)
2 points
Fendy
a year ago
discuss
982.
▲
MAIR: A Benchmark for Evaluating Instructed Retrieval
(github.com/sunnweiwei)
2 points
fzliu
a year ago
discuss
983.
▲
Show HN: Comprehensive Benchmark Suite for Story Visualization
(github.com/ViStoryBench)
2 points
hzwer
a year ago
discuss
984.
▲
Show HN: Benchmarks agree with the complexity analysis of the TopoSort algorithm
(github.com/williamw520)
2 points
ww520
a year ago
discuss
985.
▲
Show HN: I built an open-source benchmark that evaluates LLMs through gameplay
(llmshowdown.io)
2 points
jmogi
a year ago
discuss
986.
▲
QuickBench: A Zero-Dependency Linux Benchmark for CPU, Memory, and Storage
(github.com/bearstech)
2 points
kadrek
a year ago
discuss
987.
▲
Elimination Game Benchmark: Social Reasoning, Strategy, and Deception in LLMs
(github.com/lechmazur)
2 points
amichail
a year ago
discuss
988.
▲
Latest Benchmarks Show 10x Faster Prefix Queries vs. Etcd
2 points
absolute7
a year ago
discuss
989.
▲
C++ Showing std:swap faster than XOR trick to swap numbers via naive benchmark
(github.com/vladov3000)
2 points
signa11
2 years ago
discuss
990.
▲
Benchmarks Comparing PyTorch and MLX on Apple Silicon GPUs
(github.com/LucasSte)
2 points
tosh
2 years ago
discuss
More