Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
31.
▲
Show HN: CoJudge – open-source, offline judge for studying LC-style problems
(github.com/cojudge)
2 points
ansliy
7 months ago
discuss
32.
▲
Evaluating Large Language Models Using LLM-as-a-Judge
(github.com/aws-samples)
2 points
mooreds
2 years ago
discuss
33.
▲
Coderunner – A judge for your programs,run and test your programs through Python
(github.com/codeclassroom)
2 points
bhupesh
7 years ago
discuss
34.
▲
Show HN: A command line interface to UVA online judge (competitive programming)
(github.com/scvalencia)
2 points
scvalencia
10 years ago
discuss
35.
▲
Show HN: Claude-relais – A plan/build/judge loop mixing Claude with Cursor
(github.com/clementrog)
1 point
crog
4 months ago
discuss
36.
▲
Precision-Based Sampling of LLM Judges
(sunnybak.net)
1 point
sunny-bak
a year ago
discuss
37.
▲
Show HN: Lone Arena – Self-hosted LLM human evaluation, you be the judge
(github.com/Contextualist)
1 point
Contextualist
2 years ago
discuss
38.
▲
Collection of TypeScript type challenges with online judge
(github.com/type-challenges)
1 point
max-m
2 years ago
discuss
39.
▲
Show HN: A self hosted online judge for meetups and workshops, written in Go
(github.com/MohamedBassem)
1 point
mohamedbassem
9 years ago
discuss
40.
▲
Show HN: Minimal, self-hosted exercise tracker
(github.com/bmtwl)
127 points
DrPhish
a year ago
39 comments
41.
▲
Show HN: Terminal-Bench-RL: Training long-horizon terminal agents with RL
(github.com/Danau5tin)
125 points
Danau5tin
10 months ago
12 comments
42.
▲
Launch HN: Confident AI (YC W25) – Open-source evaluation framework for LLM apps
117 points
jeffreyip
a year ago
27 comments
43.
▲
Show HN: SirixDB – Bitemporal binary JSON database system and event store
(github.com/sirixdb)
109 points
lichtenberger
3 years ago
16 comments
44.
▲
Launch HN: Traceloop (YC W23) – Detecting LLM Hallucinations with OpenTelemetry
101 points
GalKlm
2 years ago
72 comments
45.
▲
Show HN: Index – New Open Source browser agent
(github.com/lmnr-ai)
98 points
skull8888888
a year ago
45 comments
46.
▲
Show HN: RULER – Easily apply RL to any agent
(openpipe.ai)
81 points
kcorbitt
a year ago
11 comments
47.
▲
Show HN: Torrix, self hosted, LLM Observability,(no Postgres, no Redis)
(github.com/torrix-ai)
74 points
AdarshRao23
23 days ago
4 comments
48.
▲
Show HN: OCR Benchmark Focusing on Automation
(nanonets.com)
58 points
prats226
a year ago
21 comments
49.
▲
Show HN: TensorZero – open-source data and learning flywheel for LLMs
(github.com/tensorzero)
49 points
GabrielBianconi
2 years ago
2 comments
50.
▲
Show HN: Helicone (YC W23) – OSS LLM Observability and Development Platform
(github.com/Helicone)
29 points
justintorre75
a year ago
7 comments
51.
▲
Show HN: Create LLM graders and run evals in JavaScript with one file
(github.com/bolt-foundry)
28 points
randall
a year ago
2 comments
52.
▲
Show HN: OSS sustain guard – Sustainability signals for OSS dependencies
(onukura.github.io)
21 points
onukura
5 months ago
6 comments
53.
▲
Show HN: Anytype – a local and collaborative database with API and MCP server
(zhanna.any.org)
20 points
sharipova
a year ago
discuss
54.
▲
Show HN: I built an open-source AI data layer that connects any LLM to any data
(github.com/bagofwords1)
18 points
y14
8 months ago
3 comments
55.
▲
Show HN: TinyFish Web Agent (82% on hard tasks vs. Operator's 43%)
(tinyfish.ai)
17 points
gargi_tinyfish
4 months ago
12 comments
56.
▲
Show HN: Meta-agent: self-improving agent harnesses from live traces
(github.com/canvas-org)
14 points
essamsleiman
2 months ago
discuss
57.
▲
Show HN: Ebiose – A Darwin‑Style Playground for Self‑Evolving AI Agents
(github.com/ebiose-ai)
12 points
vincent-ebiose
a year ago
3 comments
58.
▲
Show HN: OpenTiger – Autonomous dev orchestration that never stops
(github.com/Andyyyy64)
11 points
andyyyy64
3 months ago
2 comments
59.
▲
Show HN: Kiln – AI Boilerplate with Evals, Fine-Tuning, Synthetic Data, and Git
(github.com/Kiln-AI)
10 points
scosman
10 months ago
1 comment
60.
▲
Show HN: Unsiloed AI – #1 on olmOCR-Bench
9 points
adnan9999
10 days ago
4 comments
More