Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
481.
▲
Show HN: BloonsBench – Evaluate agent performance on Bloons Tower Defense 5
(github.com/cnqso)
1 point
cnqso
3 months ago
1 comment
482.
▲
RAGScore – Evaluate RAG pipelines in 2 commands, works offline with Ollama
(github.com/HZYAI)
1 point
antitopquark16
3 months ago
1 comment
483.
▲
ZIO-OpenFeature – Feature Flag Evaluation for Scala with ZIO
(github.com/EtaCassiopeia)
1 point
zainalpour
4 months ago
1 comment
484.
▲
Show HN: Eval based agent builder (pls roast us)
(github.com/seer-engg)
1 point
akshay326
6 months ago
1 comment
485.
▲
SigmaEval – statistical evaluation for GenAI apps
(github.com/Itura-AI)
1 point
TarekOraby
8 months ago
1 comment
486.
▲
AgentTrace – Open-Source Tracing and Evaluation for AI Agents by TensorStax
(github.com/tensorstax)
1 point
ariaattar
a year ago
1 comment
487.
▲
Dingo: A Comprehensive Data Quality Evaluation Tool
(github.com/DataEval)
1 point
e06084
a year ago
1 comment
488.
▲
ModelClash: Dynamic LLM Evaluation Through AI Duels
(github.com/mrconter1)
1 point
AIReach
2 years ago
1 comment
489.
▲
Haveged being evaluated by AI models
(github.com/jirka-h)
1 point
ernststraeumli
2 years ago
1 comment
490.
▲
Promptbench: A Unified Library for Evaluating and Understanding LLMs
(github.com/microsoft)
1 point
Flux159
2 years ago
1 comment
491.
▲
ToolBench: An evaluation suite for LLM tool manipulation capabilities
(github.com/sambanova)
1 point
xuqiantong
3 years ago
1 comment
492.
▲
Conan does not evaluate joint compatibility of version requirements by design
(github.com/conan-io)
1 point
duffsdevice
4 years ago
1 comment
493.
▲
Torch-metrics: a model evaluation package for PyTorch
(github.com/enochkan)
1 point
kanxx030
6 years ago
1 comment
494.
▲
How to Evaluate Your Career
(github.com/kthejoker)
1 point
kthejoker2
6 years ago
1 comment
495.
▲
Release 1.3.0 of Expr expression evaluation library
(github.com/antonmedv)
1 point
medv
7 years ago
1 comment
496.
▲
SMJSON: a homoiconic and “self evaluating” format of JSON
(github.com/udexon)
1 point
pplonski86
7 years ago
1 comment
497.
▲
Quine Eval Server – An Experiment
(gist.github.com)
1 point
jlturner
11 years ago
discuss
498.
▲
Fexl now using purely functional evaluation
(github.com/chkoreff)
1 point
fexl
12 years ago
discuss
499.
▲
New Fexl Release (default is eager evaluation instead of lazy)
(github.com/chkoreff)
1 point
fexl
12 years ago
discuss
500.
▲
Ruby evolution – class_eval-ing class_eval
(github.com/jumph4x)
1 point
westonplatter31
12 years ago
discuss
501.
▲
Localeval: Evaluate a string of JS code without access to the global object
(github.com/espadrine)
1 point
Signez
13 years ago
discuss
502.
▲
Stabilizer: Statistically Rigorous Performance Evaluation
(github.com/ccurtsinger)
1 point
arnsholt
13 years ago
discuss
503.
▲
Show HN: Texas Hold'em hand evaluator for node.js
(github.com/decs)
1 point
decs
13 years ago
discuss
504.
▲
LLM INQUISITOR: Evaluating how AI models handle long, realistic tasks
(github.com/AssimilatedHuman)
1 point
ballista2026
19 days ago
discuss
505.
▲
Show HN: TweakIdea – 14-dimension startup idea evaluation in Claude Code
(github.com/eph5xx)
1 point
ephx
2 months ago
discuss
506.
▲
Show HN: Evaluate Python functions at their singularities
(github.com/FWDhr)
1 point
calculusmachine
2 months ago
discuss
507.
▲
Show HN: 2500 vision benchmarks / evals for Vision Language Models
(github.com/Overshoot-ai)
1 point
zakariaelhjouji
2 months ago
discuss
508.
▲
Show HN: An agent skill for eval-driven development of LLM-powered app
(github.com/yiouli)
1 point
yol
3 months ago
discuss
509.
▲
ReqIf OPA SARIF – CI/CD semantically evaluated policy gates
(github.com/PromptExecution)
1 point
elasticventures
3 months ago
discuss
510.
▲
Show HN: Vibe Coding Review Checklist – Evaluate AI-Generated Code Quality
(github.com/aiqualitylab)
1 point
LetsAutomate
4 months ago
discuss
More