Search: github.com/eval | Heykuki News

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

1.

Ask HN: Are you willing to contribute to OpenAI Evals?

8 points

3 years ago

2.

Show HN: MCP Bridge – Access Local MCP Servers Remotely (github.com/EvalsOne)

3 points

a year ago

3.

Show HN: Iris – first MCP-native eval and observability tool for AI agents (github.com/iris-eval)

1 point

3 months ago

4.

Show HN: Evalcraft – cassette-based testing for AI agents (pytest, $0/run) (github.com/beyhangl)

1 point

3 months ago

5.

Show HN: Visualize OpenAI Evals of GPT-4 (github.com/zeno-ml)

1 point

3 years ago

6.

Evals: a framework for evaluating OpenAI models and a registry of benchmarks (github.com/openai)

123 points

3 years ago

7.

Evals in 2025: going beyond simple benchmarks to build models people can use (github.com/huggingface)

80 points

9 months ago

8.

Try out Clojure libraries via rebel-readline (github.com/eval)

70 points

3 years ago

9.

Show HN: Fast-agent – Compose MCP enabled Agents and Workflows in minutes (github.com/evalstate)

29 points

a year ago

10.

eval_macro: A New Way to Write Rust Macros (github.com/wdanilo)

9 points

a year ago

11.

Show HN: Python lib to run evals across providers: OpenAI, Anthropic, etc. (github.com/crizCraig)

8 points

2 years ago

12.

Neo Emacs – A GPU-powered Emacs written in Rust with a modern display engine (github.com/eval-exec)

7 points

3 months ago

13.

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu (github.com/eval-exec)

7 points

4 months ago

14.

Show HN: Open-source dashboard for your domain experts to improve your AI Agents (github.com/getevalkit)

5 points

a year ago

15.

Exfiltrate Data with NTP (github.com/evallen)

5 points

4 years ago

16.

GPT-4 doesn't pay close attention to detail in some cases (github.com/openai)

3 points

3 years ago

17.

Source code for evaluating decoder-based models: GANs, GMMNs, and VAEs (github.com/tonywu95)

3 points

10 years ago

18.

Show HN: CATArena – Evaluating LLM agents via dynamic enviroment interactions (github.com/AGI-Eval-Official)

3 points

5 months ago

19.

Example PR to OpenAI evals to get GPT-4 early API access (github.com/openai)

3 points

harrisonjackson

3 years ago

20.

EvalML: An AutoML library written in Python (github.com/alteryx)

3 points

5 years ago

21.

I hope to help you evaluate your GenAI App (github.com/shihongDev)

2 points

5 months ago

22.

Show HN: EvalView – Catch agent regressions before you ship (pytest for agents) (github.com/hidai25)

2 points

5 months ago

23.

Stop benchmarking LLMs. Make them fight (github.com/AGI-Eval-Official)

2 points

5 months ago

24.

Eval Protocol: RL for agents in any language, container, or framework (github.com/eval-protocol)

2 points

7 months ago

25.

Automatic Evals for LLMs (github.com/mlfoundations)

2 points

a year ago

26.

LLM Evaluation Guidebook (github.com/huggingface)

2 points

2 years ago

27.

HuggingFace/evaluate: A library for easily evaluating ML models and datasets (github.com/huggingface)

2 points

4 years ago

28.

Why Neutralinojs Is Better? Comparing with Electron and Node Webkit (github.com/neutralinojs)

2 points

7 years ago

29.

Tell HN: OpenAI charges you even if you're helping them on their Eval plarform (github.com/openai)

1 point

3 years ago

30.

OpenAI crowd sources LLM benchmarking datasets by offering advanced GPT-4 access (github.com/openai)

1 point

teaearlgraycold

3 years ago