Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
61.
▲
Clojure protected REPL
(github.com/stacksideflow)
3 points
stacksideflow2
7 years ago
discuss
62.
▲
Evaluate Selections in Sublime Text
(github.com/jbrooksuk)
2 points
jbrooksuk
13 years ago
2 comments
63.
▲
Estonia publishes its e-voting source code
(github.com/vvk-ehk)
2 points
duggieawesome
13 years ago
1 comment
64.
▲
Evaluation of Various MLX Quantizations
(github.com/deepsweet)
2 points
d-_-b
18 days ago
1 comment
65.
▲
Should we chaos test our agents?
(github.com/Corbell-AI)
2 points
himmi-01
21 days ago
1 comment
66.
▲
Open-source LLM-as-judge eval suite with root cause analysis and failure mining
(github.com/colingfly)
2 points
colinfly
3 months ago
1 comment
67.
▲
Evaluating LLMs with CommonGen-Lite
(github.com/allenai)
2 points
georgehill
2 years ago
1 comment
68.
▲
Evals Skills for AI Agents
(github.com/latitude-dev)
2 points
paulaq
a month ago
discuss
69.
▲
Show HN: Claude Code skills for building LLM evals
(github.com/latitude-dev)
2 points
paulaq
a month ago
discuss
70.
▲
Show HN: LLM‑Simple‑Eval – Easily Benchmark LLMs for Your Use Case
(github.com/grigio)
2 points
grigio
9 months ago
discuss
71.
▲
Evaluating Large Language Models Using LLM-as-a-Judge
(github.com/aws-samples)
2 points
mooreds
2 years ago
discuss
72.
▲
GPT-4-turbo-2024-04-09 "wins" simple evals benchmark
(github.com/openai)
2 points
zurfer
2 years ago
discuss
73.
▲
A survey on evaluation of large language models
(github.com/MLGroupJLU)
2 points
hhs
3 years ago
discuss
74.
▲
OpenFF – Automated estimation of physical properties
(github.com/openforcefield)
2 points
alex_hirner
5 years ago
discuss
75.
▲
Show HN: IR_evaluation – Information retrieval evaluation metrics in pure Python
(github.com/plurch)
1 point
plurch
a year ago
2 comments
76.
▲
Pulze AI Evals
(github.com/pulzeai-oss)
1 point
fbnbr
a year ago
1 comment
77.
▲
BSD_Evals: Open-source LLM evaluation tool
(github.com/brettdidonato)
1 point
bsdpython
2 years ago
1 comment
78.
▲
AgentSafeLabs – Launched Open-source Security framework for AI agents
(github.com/AgentSafeLabs)
1 point
waqarjaved
9 days ago
discuss
79.
▲
Show HN: EleutherAI / Lm-Evaluation-Harness
(github.com/EleutherAI)
1 point
marvinified
23 days ago
discuss
80.
▲
Webgrid Eval: LLM vision + tool-use on Neuralink's cursor control task
(github.com/ofou)
1 point
ofou
3 months ago
discuss
81.
▲
Network Evaluation Service
(github.com/hendemic)
1 point
gregsadetsky
a year ago
discuss
82.
▲
OpenAI: Simple-Evals
(github.com/openai)
1 point
tosh
2 years ago
discuss
83.
▲
ReactEval: Evaluating LLMs on front-end code generation
(github.com/gitwitorg)
1 point
jamesmurdza
2 years ago
discuss
84.
▲
Language Model Evaluation Harness
(github.com/EleutherAI)
1 point
tosh
3 years ago
discuss
85.
▲
Nextdoor's Cloud Security Posture Management (CSPM) Evaluation Matrix
(github.com/Nextdoor)
1 point
scapecast
3 years ago
discuss
86.
▲
Show HN: EvalGPT – Code interpreter and agent framework inspired by Google Borg
(github.com/index-labs)
1 point
jiayuanzhang
3 years ago
discuss
87.
▲
Trait-Eval – Rust
(github.com/doctorn)
1 point
blopeur
6 years ago
discuss
88.
▲
Show HN: Little tool to evaluate your cryptocurrency trades on Poloniex
(github.com/enricobacis)
1 point
enricobacis
9 years ago
discuss
89.
▲
Show HN: Freeact – A Lightweight Library for Code-Action Based Agents
(github.com/gradion-ai)
122 points
cstub
a year ago
5 comments
90.
▲
Show HN: Ellipsis – Automated PR reviews and bug fixes
(ellipsis.dev)
121 points
hunterbrooks
2 years ago
64 comments
More