Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
481.
Show HN: BloonsBench – Evaluate agent performance on Bloons Tower Defense 5 (github.com/cnqso)
1 point
cnqso
3 months ago
1 comment
482.
RAGScore – Evaluate RAG pipelines in 2 commands, works offline with Ollama (github.com/HZYAI)
1 point
antitopquark16
3 months ago
1 comment
483.
ZIO-OpenFeature – Feature Flag Evaluation for Scala with ZIO (github.com/EtaCassiopeia)
1 point
zainalpour
4 months ago
1 comment
484.
Show HN: Eval based agent builder (pls roast us) (github.com/seer-engg)
1 point
akshay326
6 months ago
1 comment
485.
SigmaEval – statistical evaluation for GenAI apps (github.com/Itura-AI)
1 point
TarekOraby
8 months ago
1 comment
486.
AgentTrace – Open-Source Tracing and Evaluation for AI Agents by TensorStax (github.com/tensorstax)
1 point
ariaattar
a year ago
1 comment
487.
Dingo: A Comprehensive Data Quality Evaluation Tool (github.com/DataEval)
1 point
e06084
a year ago
1 comment
488.
ModelClash: Dynamic LLM Evaluation Through AI Duels (github.com/mrconter1)
1 point
AIReach
2 years ago
1 comment
489.
Haveged being evaluated by AI models (github.com/jirka-h)
1 point
ernststraeumli
2 years ago
1 comment
490.
Promptbench: A Unified Library for Evaluating and Understanding LLMs (github.com/microsoft)
1 point
Flux159
2 years ago
1 comment
491.
ToolBench: An evaluation suite for LLM tool manipulation capabilities (github.com/sambanova)
1 point
xuqiantong
3 years ago
1 comment
492.
Conan does not evaluate joint compatibility of version requirements by design (github.com/conan-io)
1 point
duffsdevice
4 years ago
1 comment
493.
Torch-metrics: a model evaluation package for PyTorch (github.com/enochkan)
1 point
kanxx030
6 years ago
1 comment
494.
How to Evaluate Your Career (github.com/kthejoker)
1 point
kthejoker2
6 years ago
1 comment
495.
Release 1.3.0 of Expr expression evaluation library (github.com/antonmedv)
1 point
medv
7 years ago
1 comment
496.
SMJSON: a homoiconic and “self evaluating” format of JSON (github.com/udexon)
1 point
pplonski86
7 years ago
1 comment
497.
Quine Eval Server – An Experiment (gist.github.com)
1 point
jlturner
11 years ago
discuss
498.
Fexl now using purely functional evaluation (github.com/chkoreff)
1 point
fexl
12 years ago
discuss
499.
New Fexl Release (default is eager evaluation instead of lazy) (github.com/chkoreff)
1 point
fexl
12 years ago
discuss
500.
Ruby evolution – class_eval-ing class_eval (github.com/jumph4x)
1 point
westonplatter31
12 years ago
discuss
501.
Localeval: Evaluate a string of JS code without access to the global object (github.com/espadrine)
1 point
Signez
13 years ago
discuss
502.
Stabilizer: Statistically Rigorous Performance Evaluation (github.com/ccurtsinger)
1 point
arnsholt
13 years ago
discuss
503.
Show HN: Texas Hold'em hand evaluator for node.js (github.com/decs)
1 point
decs
13 years ago
discuss
504.
LLM INQUISITOR: Evaluating how AI models handle long, realistic tasks (github.com/AssimilatedHuman)
1 point
ballista2026
19 days ago
discuss
505.
Show HN: TweakIdea – 14-dimension startup idea evaluation in Claude Code (github.com/eph5xx)
1 point
ephx
2 months ago
discuss
506.
Show HN: Evaluate Python functions at their singularities (github.com/FWDhr)
1 point
calculusmachine
2 months ago
discuss
507.
Show HN: 2500 vision benchmarks / evals for Vision Language Models (github.com/Overshoot-ai)
1 point
zakariaelhjouji
2 months ago
discuss
508.
Show HN: An agent skill for eval-driven development of LLM-powered app (github.com/yiouli)
1 point
yol
3 months ago
discuss
509.
ReqIf OPA SARIF – CI/CD semantically evaluated policy gates (github.com/PromptExecution)
1 point
elasticventures
3 months ago
discuss
510.
Show HN: Vibe Coding Review Checklist – Evaluate AI-Generated Code Quality (github.com/aiqualitylab)
1 point
LetsAutomate
4 months ago
discuss
More