Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
1.
▲
Ask HN: Are you willing to contribute to OpenAI Evals?
8 points
nullptr_deref
3 years ago
5 comments
2.
▲
Show HN: MCP Bridge – Access Local MCP Servers Remotely
(github.com/EvalsOne)
3 points
everfly
a year ago
1 comment
3.
▲
Show HN: Iris – first MCP-native eval and observability tool for AI agents
(github.com/iris-eval)
1 point
iparent
3 months ago
discuss
4.
▲
Show HN: Evalcraft – cassette-based testing for AI agents (pytest, $0/run)
(github.com/beyhangl)
1 point
beyhang
3 months ago
discuss
5.
▲
Show HN: Visualize OpenAI Evals of GPT-4
(github.com/zeno-ml)
1 point
confutio
3 years ago
discuss
6.
▲
Evals: a framework for evaluating OpenAI models and a registry of benchmarks
(github.com/openai)
123 points
tosh
3 years ago
16 comments
7.
▲
Evals in 2025: going beyond simple benchmarks to build models people can use
(github.com/huggingface)
80 points
jxmorris12
9 months ago
8 comments
8.
▲
Try out Clojure libraries via rebel-readline
(github.com/eval)
70 points
todsacerdoti
3 years ago
8 comments
9.
▲
Show HN: Fast-agent – Compose MCP enabled Agents and Workflows in minutes
(github.com/evalstate)
29 points
evalstate
a year ago
3 comments
10.
▲
eval_macro: A New Way to Write Rust Macros
(github.com/wdanilo)
9 points
W4G1
a year ago
discuss
11.
▲
Show HN: Python lib to run evals across providers: OpenAI, Anthropic, etc.
(github.com/crizCraig)
8 points
cr4zy
2 years ago
1 comment
12.
▲
Neo Emacs – A GPU-powered Emacs written in Rust with a modern display engine
(github.com/eval-exec)
7 points
agarttha
3 months ago
1 comment
13.
▲
Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu
(github.com/eval-exec)
7 points
evalexec
4 months ago
discuss
14.
▲
Show HN: Open-source dashboard for your domain experts to improve your AI Agents
(github.com/getevalkit)
5 points
mellowcookie
a year ago
discuss
15.
▲
Exfiltrate Data with NTP
(github.com/evallen)
5 points
amony
4 years ago
discuss
16.
▲
GPT-4 doesn't pay close attention to detail in some cases
(github.com/openai)
3 points
mcaledonensis
3 years ago
2 comments
17.
▲
Source code for evaluating decoder-based models: GANs, GMMNs, and VAEs
(github.com/tonywu95)
3 points
Dim25
10 years ago
1 comment
18.
▲
Show HN: CATArena – Evaluating LLM agents via dynamic enviroment interactions
(github.com/AGI-Eval-Official)
3 points
jinqueeny
5 months ago
discuss
19.
▲
Example PR to OpenAI evals to get GPT-4 early API access
(github.com/openai)
3 points
harrisonjackson
3 years ago
discuss
20.
▲
EvalML: An AutoML library written in Python
(github.com/alteryx)
3 points
merqurio
5 years ago
discuss
21.
▲
I hope to help you evaluate your GenAI App
(github.com/shihongDev)
2 points
shloveai
5 months ago
2 comments
22.
▲
Show HN: EvalView – Catch agent regressions before you ship (pytest for agents)
(github.com/hidai25)
2 points
hidai25
5 months ago
1 comment
23.
▲
Stop benchmarking LLMs. Make them fight
(github.com/AGI-Eval-Official)
2 points
jinqueeny
5 months ago
discuss
24.
▲
Eval Protocol: RL for agents in any language, container, or framework
(github.com/eval-protocol)
2 points
dphuang2
7 months ago
discuss
25.
▲
Automatic Evals for LLMs
(github.com/mlfoundations)
2 points
saikatsg
a year ago
discuss
26.
▲
LLM Evaluation Guidebook
(github.com/huggingface)
2 points
erinys
2 years ago
discuss
27.
▲
HuggingFace/evaluate: A library for easily evaluating ML models and datasets
(github.com/huggingface)
2 points
occamschainsaw
4 years ago
discuss
28.
▲
Why Neutralinojs Is Better? Comparing with Electron and Node Webkit
(github.com/neutralinojs)
2 points
delvincasper
7 years ago
discuss
29.
▲
Tell HN: OpenAI charges you even if you're helping them on their Eval plarform
(github.com/openai)
1 point
behnamoh
3 years ago
2 comments
30.
▲
OpenAI crowd sources LLM benchmarking datasets by offering advanced GPT-4 access
(github.com/openai)
1 point
teaearlgraycold
3 years ago
2 comments
More