Search: github.com/eval | Heykuki News

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

301.

Mcpbr: Stop guessing and evaluate your MCP server against standard benchmarks (github.com/greynewell)

3 points

5 months ago

302.

Rogue: Open-source AI agent evaluation framework (github.com/qualifire-dev)

3 points

8 months ago

303.

AWorld: Build, evaluate and train General Multi-Agent Assistance with ease (github.com/inclusionAI)

3 points

10 months ago

304.

15 AI Coding Agents evaluated with the same prompt (github.com/The-Focus-AI)

3 points

a year ago

305.

NoLiMa: Long-Context Evaluation Beyond Literal Matching (github.com/adobe-research)

3 points

a year ago

306.

I built an ethical evaluation engine for scoring sys. alignment, not efficiency (github.com/luminaAnonima)

3 points

a year ago

307.

A novel open-source framework for evaluating conversational agents (github.com/plurai-ai)

3 points

a year ago

308.

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale (github.com/microsoft)

3 points

2 years ago

309.

Cedar – open-source policy language and evaluation engine (github.com/cedar-policy)

3 points

3 years ago

310.

Show HN: Evaluate Deep Learning models directly in a database with PyNeuraLogic (github.com/LukasZahradnik)

3 points

4 years ago

311.

Show HN: Wielder – Write and evaluate Clojure code in your Obsidian documents (github.com/victorb)

3 points

4 years ago

312.

Show HN: Oyster, an interactive Perl eval server (github.com/gatlin)

3 points

15 years ago

313.

Koila: Prevent PyTorch's out of memory error with lazy evaluation (github.com/rentruewang)

3 points

5 years ago

314.

Simple Safe Sandboxed Extensible Expression Evaluator for Python (github.com/danthedeckie)

3 points

8 years ago

315.

Show HN: ClojureCalc, a libreoffice Calc Add-In to evaluate clojure expressions (github.com/beothorn)

3 points

11 years ago

316.

Rouge.js: Recall-Oriented Understudy for Gisting Evaluation Metric (github.com/kenlimmj)

3 points

11 years ago

317.

Cisco Foundry Security Spec: Open specification for agentic security evaluation (github.com/CiscoDevNet)

3 points

15 days ago

318.

Show HN: Nexa-gauge – Cache/cost-aware graph-based eval for LLM and RAG (github.com/harnexa)

3 points

a month ago

319.

Show HN: FC-Eval – CLI to Benchmark Local or Cloud LLMs on Function Calling (github.com/gauravvij)

3 points

3 months ago

320.

Show HN: Rhesis AI - Multimodal test cases for agentic evals

3 points

3 months ago

321.

Show HN: Auditi – open-source LLM tracing and evaluation platform (github.com/deduu)

3 points

4 months ago

322.

Harbor – a framework for evaluating and optimizing agents and language models (github.com/laude-institute)

3 points

7 months ago

323.

OpenBench: Provider-agnostic, open-source evaluation infrastructure for LLMs (github.com/groq)

3 points

8 months ago

324.

Show HN: Evaluate your website usability in seconds (desplega.ai)

3 points

8 months ago

325.

LLM Evaluation via Rap Battles (github.com/vadim0x60)

3 points

9 months ago

326.

Show HN: Prompt-to-proof: reproducible LLM eval with hash-chained receipts (github.com/kju4q)

3 points

9 months ago

327.

Provider-agnostic, open-source evaluation infra for LLMs (github.com/groq)

3 points

10 months ago

328.

Show HN: Zbench, RAG evals using chess Elo ratings (github.com/zeroentropy-ai)

3 points

10 months ago

329.

MCPvals, an eval library for MCP Servers (github.com/Kylejeong2)

3 points

a year ago

330.

A collection of resources about normalization-by-evaluation (github.com/etiams)

3 points

a year ago