Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
91.
Launch HN: Relari (YC W24) – Identify the root cause of problems in LLM apps
106 points
antonap
2 years ago
15 comments
92.
Show HN: Web-eval-agent – Let the coding agent debug itself (github.com/Operative-Sh)
84 points
neversettles
a year ago
12 comments
93.
Show HN: Ellipsis – Automatic pull request reviews (ellipsis.dev)
18 points
hunterbrooks
2 years ago
11 comments
94.
Show HN: Honcho – Open-source memory infrastructure, powered by custom models (github.com/plastic-labs)
8 points
vvoruganti
4 months ago
discuss
95.
Bad MCP design costs your agent 5x more tokens
6 points
JohnnyZhang483
19 hours ago
discuss
96.
Show HN: Agent Tinman – Autonomous failure discovery for LLM systems (github.com/oliveskin)
4 points
oliveskin
4 months ago
discuss
97.
Show HN: Open Operator Evals – real-world benchmarks for LLM web agents (github.com/nottelabs)
3 points
monoid73
a year ago
1 comment
98.
Show HN: PromptProof – CI gate for LLM outputs (schema/regex/cost; no API keys) (news.ycombinator.com)
2 points
geminimir
9 months ago
discuss
99.
Show HN: I made web agents reliable with smaller LLMs via natural language (github.com/nottelabs)
2 points
giordanol
a year ago
discuss
100.
Deprecating A/B tests with offline policy evaluation
1 point
econti
5 years ago
discuss
101.
Show HN: Agent-skills-eval – Test whether Agent Skills improve outputs (github.com/darkrishabh)
79 points
darkrishabh
a month ago
37 comments
102.
Show HN: Continuous-eval – Granular evaluation of GenAI pipelines (github.com/relari-ai)
10 points
antonap
2 years ago
2 comments
103.
Show HN: I designed a ChatGPT prompt evaluator to ruin your fun;) (github.com/alignedai)
8 points
buildaligned
3 years ago
1 comment
104.
Show HN: Image Eval – An evaluation toolkit for image generation models (github.com/Storia-AI)
7 points
nutellalover
3 years ago
discuss
105.
Open RAG Eval (github.com/vectara)
6 points
TastyLamps
a year ago
1 comment
106.
In a sample of >1000 games, GPT-3.5-turbo-instruct plays chess with ~1800 elo (github.com/adamkarvonen)
4 points
sebzim4500
3 years ago
4 comments
107.
Show HN: Eval.js – a JavaScript interpreter written in JavaScript (github.com/marten-de-vries)
4 points
marten-de-vries
11 years ago
1 comment
108.
Open Game Eval: an eval for agentic Lua game development in Roblox (github.com/Roblox)
3 points
kartayyar
6 months ago
discuss
109.
Show HN: TypeScript type-level math expression parser and evaluator (github.com/dqbd)
3 points
dqbd
3 years ago
discuss
110.
GPT4 Learning from Reflection (github.com/GammaTauAI)
3 points
agomez314
3 years ago
discuss
111.
Can LLMs accurately evaluate their own confidence? (github.com/anerli)
2 points
anerli
a year ago
2 comments
112.
Show HN: CLI tool to analyze your Vector Embeddings! (github.com/dakshjain-1616)
2 points
gauravvij137
4 months ago
1 comment
113.
Show HN: OpenSciEval-AI Deriving Prime Theorem from Chaos (github.com/maris205)
2 points
mairswang
6 months ago
1 comment
114.
Show HN: PromptProof – CI gate for LLM outputs (schema/regex/cost; no API keys) (github.com/marketplace)
2 points
geminimir
10 months ago
1 comment
115.
Keyboard Layout Evaluation (github.com/bclnr)
2 points
Egoist
4 years ago
1 comment
116.
Evaluation Code – GPT-5 on Multimodal Medical Reasoning (github.com/wangshansong1)
2 points
Topfi
9 months ago
discuss
117.
Opensource operators evals (github.com/nottelabs)
2 points
kernelito
a year ago
discuss
118.
Show HN: Python library to run a “function” over a set of data via ChatGPT (github.com/TylerGlaiel)
2 points
TylerGlaiel
3 years ago
discuss
119.
Show HN: Spark-LLM-eval – Distributed LLM evaluation for Spark (github.com/bassrehab)
1 point
subhadipmitra
6 months ago
1 comment
120.
LLM-eval-kit: Distributed LLM evaluation framework (v0.3.0) (github.com/benmeryem-tech)
1 point
benmeryem_ai
a month ago
discuss
More