Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
961.
An Nginx Engineer Took over AI's Benchmark Tool (github.com/hongzhidao)
2 points
zhidao9
4 months ago
discuss
962.
KiteSQL: Rust-native embedded SQL with TPC-C benchmarks and WASM support (github.com/KipData)
2 points
Jacques2Marais
4 months ago
discuss
963.
WorkBench-Pro – PC benchmark designed for developer workflows (github.com/johanmcad)
2 points
johanmcad
4 months ago
discuss
964.
Benchmark Comparison: JSONL vs. TOON output for JSON-render efficiency (github.com/vercel-labs)
2 points
lafalce
5 months ago
discuss
965.
Show HN: Rerankers – Models, benchmarks, and papers for RAG (github.com/agentset-ai)
2 points
midamurat
5 months ago
discuss
966.
Show HN: sc-membench for modern memory bandwidth and latency benchmarks (github.com/spareCores)
2 points
daroczig
5 months ago
discuss
967.
Show HN: Long-horizon LLM coherence benchmark (500 cycles) (zenodo.org)
2 points
teugent
5 months ago
discuss
968.
Epiplexity to Beat DeepMind's Alchemy Meta RL Benchmark (github.com/RandMan444)
2 points
Phillip98798
5 months ago
discuss
969.
Show HN: JSONBench, a Benchmark for Data Analytics on JSON (github.com/ClickHouse)
2 points
saisrirampur
5 months ago
discuss
970.
Stop benchmarking LLMs. Make them fight (github.com/AGI-Eval-Official)
2 points
jinqueeny
5 months ago
discuss
971.
Show HN: Sigma Runtime – 550-cycle identity stability benchmark on GPT-5.2 (github.com/sigmastratum)
2 points
teugent
6 months ago
discuss
972.
Benchmarking LLMs on whether they can play FizzBuzz (github.com/venkatasg)
2 points
_venkatasg
6 months ago
discuss
973.
Running a 270M LLM on Android (architecture and benchmarks)
2 points
ayushranjan99
7 months ago
discuss
974.
TypeNet Benchmark for development of authentication keystroke technologies (github.com/BiDAlab)
2 points
mooreds
8 months ago
discuss
975.
AutoCodeBench: Large Language Models Are Automatic Code Benchmark Generators (github.com/Tencent-Hunyuan)
2 points
ngrilly
9 months ago
discuss
976.
Show HN: Little Fluffy Clouds: Combine a bunch of small adjacent networks (github.com/kstrauser)
2 points
kstrauser
9 months ago
discuss
977.
Behavior: Robot manipulation benchmark based on 1000 household tasks (github.com/StanfordVL)
2 points
transpute
9 months ago
discuss
978.
Show HN: LLM‑Simple‑Eval – Easily Benchmark LLMs for Your Use Case (github.com/grigio)
2 points
grigio
9 months ago
discuss
979.
PostgreSQL vs. ClickHouse: Learnings from building my first database benchmark (github.com/514-labs)
2 points
oatsandsugar
10 months ago
discuss
980.
Show HN: New SWE-bench leaderboard compares LMs without fancy agent scaffolds (swebench.com)
2 points
lieret
10 months ago
discuss
981.
Show HN: VDBbench 1.0: open-source benchmarking for VectorDBs (github.com/zilliztech)
2 points
Fendy
a year ago
discuss
982.
MAIR: A Benchmark for Evaluating Instructed Retrieval (github.com/sunnweiwei)
2 points
fzliu
a year ago
discuss
983.
Show HN: Comprehensive Benchmark Suite for Story Visualization (github.com/ViStoryBench)
2 points
hzwer
a year ago
discuss
984.
Show HN: Benchmarks agree with the complexity analysis of the TopoSort algorithm (github.com/williamw520)
2 points
ww520
a year ago
discuss
985.
Show HN: I built an open-source benchmark that evaluates LLMs through gameplay (llmshowdown.io)
2 points
jmogi
a year ago
discuss
986.
QuickBench: A Zero-Dependency Linux Benchmark for CPU, Memory, and Storage (github.com/bearstech)
2 points
kadrek
a year ago
discuss
987.
Elimination Game Benchmark: Social Reasoning, Strategy, and Deception in LLMs (github.com/lechmazur)
2 points
amichail
a year ago
discuss
988.
Latest Benchmarks Show 10x Faster Prefix Queries vs. Etcd
2 points
absolute7
a year ago
discuss
989.
C++ Showing std:swap faster than XOR trick to swap numbers via naive benchmark (github.com/vladov3000)
2 points
signa11
2 years ago
discuss
990.
Benchmarks Comparing PyTorch and MLX on Apple Silicon GPUs (github.com/LucasSte)
2 points
tosh
2 years ago
discuss
More