Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
61.
Clojure protected REPL (github.com/stacksideflow)
3 points
stacksideflow2
7 years ago
discuss
62.
Evaluate Selections in Sublime Text (github.com/jbrooksuk)
2 points
jbrooksuk
13 years ago
2 comments
63.
Estonia publishes its e-voting source code (github.com/vvk-ehk)
2 points
duggieawesome
13 years ago
1 comment
64.
Evaluation of Various MLX Quantizations (github.com/deepsweet)
2 points
d-_-b
18 days ago
1 comment
65.
Should we chaos test our agents? (github.com/Corbell-AI)
2 points
himmi-01
21 days ago
1 comment
66.
Open-source LLM-as-judge eval suite with root cause analysis and failure mining (github.com/colingfly)
2 points
colinfly
3 months ago
1 comment
67.
Evaluating LLMs with CommonGen-Lite (github.com/allenai)
2 points
georgehill
2 years ago
1 comment
68.
Evals Skills for AI Agents (github.com/latitude-dev)
2 points
paulaq
a month ago
discuss
69.
Show HN: Claude Code skills for building LLM evals (github.com/latitude-dev)
2 points
paulaq
a month ago
discuss
70.
Show HN: LLM‑Simple‑Eval – Easily Benchmark LLMs for Your Use Case (github.com/grigio)
2 points
grigio
9 months ago
discuss
71.
Evaluating Large Language Models Using LLM-as-a-Judge (github.com/aws-samples)
2 points
mooreds
2 years ago
discuss
72.
GPT-4-turbo-2024-04-09 "wins" simple evals benchmark (github.com/openai)
2 points
zurfer
2 years ago
discuss
73.
A survey on evaluation of large language models (github.com/MLGroupJLU)
2 points
hhs
3 years ago
discuss
74.
OpenFF – Automated estimation of physical properties (github.com/openforcefield)
2 points
alex_hirner
5 years ago
discuss
75.
Show HN: IR_evaluation – Information retrieval evaluation metrics in pure Python (github.com/plurch)
1 point
plurch
a year ago
2 comments
76.
Pulze AI Evals (github.com/pulzeai-oss)
1 point
fbnbr
a year ago
1 comment
77.
BSD_Evals: Open-source LLM evaluation tool (github.com/brettdidonato)
1 point
bsdpython
2 years ago
1 comment
78.
AgentSafeLabs – Launched Open-source Security framework for AI agents (github.com/AgentSafeLabs)
1 point
waqarjaved
9 days ago
discuss
79.
Show HN: EleutherAI / Lm-Evaluation-Harness (github.com/EleutherAI)
1 point
marvinified
23 days ago
discuss
80.
Webgrid Eval: LLM vision + tool-use on Neuralink's cursor control task (github.com/ofou)
1 point
ofou
3 months ago
discuss
81.
Network Evaluation Service (github.com/hendemic)
1 point
gregsadetsky
a year ago
discuss
82.
OpenAI: Simple-Evals (github.com/openai)
1 point
tosh
2 years ago
discuss
83.
ReactEval: Evaluating LLMs on front-end code generation (github.com/gitwitorg)
1 point
jamesmurdza
2 years ago
discuss
84.
Language Model Evaluation Harness (github.com/EleutherAI)
1 point
tosh
3 years ago
discuss
85.
Nextdoor's Cloud Security Posture Management (CSPM) Evaluation Matrix (github.com/Nextdoor)
1 point
scapecast
3 years ago
discuss
86.
Show HN: EvalGPT – Code interpreter and agent framework inspired by Google Borg (github.com/index-labs)
1 point
jiayuanzhang
3 years ago
discuss
87.
Trait-Eval – Rust (github.com/doctorn)
1 point
blopeur
6 years ago
discuss
88.
Show HN: Little tool to evaluate your cryptocurrency trades on Poloniex (github.com/enricobacis)
1 point
enricobacis
9 years ago
discuss
89.
Show HN: Freeact – A Lightweight Library for Code-Action Based Agents (github.com/gradion-ai)
122 points
cstub
a year ago
5 comments
90.
Show HN: Ellipsis – Automated PR reviews and bug fixes (ellipsis.dev)
121 points
hunterbrooks
2 years ago
64 comments
More