Search: github.com/bendc | Heykuki News

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

931.

BrowseComp-Plus: A More Fair and Transparent Benchmark of Deep-Research Agent (github.com/texttron)

2 points

colonCapitalDee

4 days ago

932.

Show HN: AgentThreatBench – Benchmark for AI Agent Memory Security (github.com/OWASP)

2 points

8 days ago

933.

Prompter – Compare and benchmark Ollama models side-by-side in your terminal (github.com/whonixnetworks)

2 points

14 days ago

934.

Show HN: 97% on SWE-bench Verified with subscription-token agents (github.com/kimjune01)

2 points

16 days ago

935.

Show HN: Verdict – model evals on your own data, not someone else's benchmark (github.com/aevyraai)

2 points

a month ago

936.

talkie-coder: From 1930 to SWE-bench (github.com/RicardoDominguez)

2 points

a month ago

937.

Open macro placement benchmark and $20k challenge (HRT-sponsored) (github.com/partcleda)

2 points

2 months ago

938.

Show HN: WMB-100K – Open benchmark for AI memory systems at 100K turns (github.com/Irina1920)

2 points

2 months ago

939.

Show HN: OpenClaw Arena – Benchmark models on real tasks, rank by perf and cost (app.uniclaw.ai)

2 points

2 months ago

940.

An open source benchmarking framework for IT automation (github.com/itbench-hub)

2 points

2 months ago

941.

Mitata: Benchmark tooling that loves you (github.com/evanwashere)

2 points

3 months ago

942.

Help me improving this benchmark for vector engines (github.com/M4iKZ)

2 points

3 months ago

943.

Some critical issues with the SWE-bench-Pro environments (github.com/SWE-agent)

2 points

3 months ago

944.

BetterKV – A multithreaded Rust Redis alternative, 10-30x faster in benchmarks

2 points

3 months ago

945.

Show HN: ModelSweep - Open-Source Benchmarking for Local LLMs (github.com/leonickson1)

2 points

3 months ago

946.

FratBench – Social Calibration Benchmark (OAI Scores Dead Last) [pdf] (github.com/richar-wang)

2 points

3 months ago

947.

TLAi+ Benchmarks for Evaluating LLMs (github.com/tlaplus)

2 points

3 months ago

948.

An Nginx Engineer Took over AI's Benchmark Tool (github.com/hongzhidao)

2 points

4 months ago

949.

KiteSQL: Rust-native embedded SQL with TPC-C benchmarks and WASM support (github.com/KipData)

2 points

4 months ago

950.

WorkBench-Pro – PC benchmark designed for developer workflows (github.com/johanmcad)

2 points

4 months ago

951.

Benchmark Comparison: JSONL vs. TOON output for JSON-render efficiency (github.com/vercel-labs)

2 points

5 months ago

952.

Show HN: Rerankers – Models, benchmarks, and papers for RAG (github.com/agentset-ai)

2 points

5 months ago

953.

Show HN: sc-membench for modern memory bandwidth and latency benchmarks (github.com/spareCores)

2 points

5 months ago

954.

Show HN: Long-horizon LLM coherence benchmark (500 cycles) (zenodo.org)

2 points

5 months ago

955.

Epiplexity to Beat DeepMind's Alchemy Meta RL Benchmark (github.com/RandMan444)

2 points

5 months ago

956.

Show HN: JSONBench, a Benchmark for Data Analytics on JSON (github.com/ClickHouse)

2 points

5 months ago

957.

Stop benchmarking LLMs. Make them fight (github.com/AGI-Eval-Official)

2 points

5 months ago

958.

Show HN: Sigma Runtime – 550-cycle identity stability benchmark on GPT-5.2 (github.com/sigmastratum)

2 points

6 months ago

959.

Benchmarking LLMs on whether they can play FizzBuzz (github.com/venkatasg)

2 points

6 months ago

960.

Running a 270M LLM on Android (architecture and benchmarks)

2 points

7 months ago