Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
1.
Ask HN: Are you willing to contribute to OpenAI Evals?
8 points
nullptr_deref
3 years ago
5 comments
2.
Show HN: MCP Bridge – Access Local MCP Servers Remotely (github.com/EvalsOne)
3 points
everfly
a year ago
1 comment
3.
Show HN: Iris – first MCP-native eval and observability tool for AI agents (github.com/iris-eval)
1 point
iparent
3 months ago
discuss
4.
Show HN: Evalcraft – cassette-based testing for AI agents (pytest, $0/run) (github.com/beyhangl)
1 point
beyhang
3 months ago
discuss
5.
Show HN: Visualize OpenAI Evals of GPT-4 (github.com/zeno-ml)
1 point
confutio
3 years ago
discuss
6.
Evals: a framework for evaluating OpenAI models and a registry of benchmarks (github.com/openai)
123 points
tosh
3 years ago
16 comments
7.
Evals in 2025: going beyond simple benchmarks to build models people can use (github.com/huggingface)
80 points
jxmorris12
9 months ago
8 comments
8.
Try out Clojure libraries via rebel-readline (github.com/eval)
70 points
todsacerdoti
3 years ago
8 comments
9.
Show HN: Fast-agent – Compose MCP enabled Agents and Workflows in minutes (github.com/evalstate)
29 points
evalstate
a year ago
3 comments
10.
eval_macro: A New Way to Write Rust Macros (github.com/wdanilo)
9 points
W4G1
a year ago
discuss
11.
Show HN: Python lib to run evals across providers: OpenAI, Anthropic, etc. (github.com/crizCraig)
8 points
cr4zy
2 years ago
1 comment
12.
Neo Emacs – A GPU-powered Emacs written in Rust with a modern display engine (github.com/eval-exec)
7 points
agarttha
3 months ago
1 comment
13.
Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu (github.com/eval-exec)
7 points
evalexec
4 months ago
discuss
14.
Show HN: Open-source dashboard for your domain experts to improve your AI Agents (github.com/getevalkit)
5 points
mellowcookie
a year ago
discuss
15.
Exfiltrate Data with NTP (github.com/evallen)
5 points
amony
4 years ago
discuss
16.
GPT-4 doesn't pay close attention to detail in some cases (github.com/openai)
3 points
mcaledonensis
3 years ago
2 comments
17.
Source code for evaluating decoder-based models: GANs, GMMNs, and VAEs (github.com/tonywu95)
3 points
Dim25
10 years ago
1 comment
18.
Show HN: CATArena – Evaluating LLM agents via dynamic enviroment interactions (github.com/AGI-Eval-Official)
3 points
jinqueeny
5 months ago
discuss
19.
Example PR to OpenAI evals to get GPT-4 early API access (github.com/openai)
3 points
harrisonjackson
3 years ago
discuss
20.
EvalML: An AutoML library written in Python (github.com/alteryx)
3 points
merqurio
5 years ago
discuss
21.
I hope to help you evaluate your GenAI App (github.com/shihongDev)
2 points
shloveai
5 months ago
2 comments
22.
Show HN: EvalView – Catch agent regressions before you ship (pytest for agents) (github.com/hidai25)
2 points
hidai25
5 months ago
1 comment
23.
Stop benchmarking LLMs. Make them fight (github.com/AGI-Eval-Official)
2 points
jinqueeny
5 months ago
discuss
24.
Eval Protocol: RL for agents in any language, container, or framework (github.com/eval-protocol)
2 points
dphuang2
7 months ago
discuss
25.
Automatic Evals for LLMs (github.com/mlfoundations)
2 points
saikatsg
a year ago
discuss
26.
LLM Evaluation Guidebook (github.com/huggingface)
2 points
erinys
2 years ago
discuss
27.
HuggingFace/evaluate: A library for easily evaluating ML models and datasets (github.com/huggingface)
2 points
occamschainsaw
4 years ago
discuss
28.
Why Neutralinojs Is Better? Comparing with Electron and Node Webkit (github.com/neutralinojs)
2 points
delvincasper
7 years ago
discuss
29.
Tell HN: OpenAI charges you even if you're helping them on their Eval plarform (github.com/openai)
1 point
behnamoh
3 years ago
2 comments
30.
OpenAI crowd sources LLM benchmarking datasets by offering advanced GPT-4 access (github.com/openai)
1 point
teaearlgraycold
3 years ago
2 comments
More