Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
301.
Mcpbr: Stop guessing and evaluate your MCP server against standard benchmarks (github.com/greynewell)
3 points
captradeoff
5 months ago
1 comment
302.
Rogue: Open-source AI agent evaluation framework (github.com/qualifire-dev)
3 points
drorivryQF
8 months ago
1 comment
303.
AWorld: Build, evaluate and train General Multi-Agent Assistance with ease (github.com/inclusionAI)
3 points
gfortaine
10 months ago
1 comment
304.
15 AI Coding Agents evaluated with the same prompt (github.com/The-Focus-AI)
3 points
combray
a year ago
1 comment
305.
NoLiMa: Long-Context Evaluation Beyond Literal Matching (github.com/adobe-research)
3 points
consumer451
a year ago
1 comment
306.
I built an ethical evaluation engine for scoring sys. alignment, not efficiency (github.com/luminaAnonima)
3 points
luminaAnonima
a year ago
1 comment
307.
A novel open-source framework for evaluating conversational agents (github.com/plurai-ai)
3 points
nirdiamant
a year ago
1 comment
308.
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale (github.com/microsoft)
3 points
francedot
2 years ago
1 comment
309.
Cedar – open-source policy language and evaluation engine (github.com/cedar-policy)
3 points
max2
3 years ago
1 comment
310.
Show HN: Evaluate Deep Learning models directly in a database with PyNeuraLogic (github.com/LukasZahradnik)
3 points
LukasZahradnik
4 years ago
1 comment
311.
Show HN: Wielder – Write and evaluate Clojure code in your Obsidian documents (github.com/victorb)
3 points
diggan
4 years ago
1 comment
312.
Show HN: Oyster, an interactive Perl eval server (github.com/gatlin)
3 points
gatlin
15 years ago
1 comment
313.
Koila: Prevent PyTorch's out of memory error with lazy evaluation (github.com/rentruewang)
3 points
b06901038
5 years ago
1 comment
314.
Simple Safe Sandboxed Extensible Expression Evaluator for Python (github.com/danthedeckie)
3 points
wilsonfiifi
8 years ago
1 comment
315.
Show HN: ClojureCalc, a libreoffice Calc Add-In to evaluate clojure expressions (github.com/beothorn)
3 points
beothorn
11 years ago
discuss
316.
Rouge.js: Recall-Oriented Understudy for Gisting Evaluation Metric (github.com/kenlimmj)
3 points
kenlimmj
11 years ago
discuss
317.
Cisco Foundry Security Spec: Open specification for agentic security evaluation (github.com/CiscoDevNet)
3 points
cpard
15 days ago
discuss
318.
Show HN: Nexa-gauge – Cache/cost-aware graph-based eval for LLM and RAG (github.com/harnexa)
3 points
Sardhendu
a month ago
discuss
319.
Show HN: FC-Eval – CLI to Benchmark Local or Cloud LLMs on Function Calling (github.com/gauravvij)
3 points
gauravvij137
3 months ago
discuss
320.
Show HN: Rhesis AI - Multimodal test cases for agentic evals
3 points
nicolaib
3 months ago
discuss
321.
Show HN: Auditi – open-source LLM tracing and evaluation platform (github.com/deduu)
3 points
ariansyah
4 months ago
discuss
322.
Harbor – a framework for evaluating and optimizing agents and language models (github.com/laude-institute)
3 points
piebro
7 months ago
discuss
323.
OpenBench: Provider-agnostic, open-source evaluation infrastructure for LLMs (github.com/groq)
3 points
ofou
8 months ago
discuss
324.
Show HN: Evaluate your website usability in seconds (desplega.ai)
3 points
tarasyarema
8 months ago
discuss
325.
LLM Evaluation via Rap Battles (github.com/vadim0x60)
3 points
vadimdotme
9 months ago
discuss
326.
Show HN: Prompt-to-proof: reproducible LLM eval with hash-chained receipts (github.com/kju4q)
3 points
Qendresahoti
9 months ago
discuss
327.
Provider-agnostic, open-source evaluation infra for LLMs (github.com/groq)
3 points
nkko
10 months ago
discuss
328.
Show HN: Zbench, RAG evals using chess Elo ratings (github.com/zeroentropy-ai)
3 points
ghita_
10 months ago
discuss
329.
MCPvals, an eval library for MCP Servers (github.com/Kylejeong2)
3 points
gniting
a year ago
discuss
330.
A collection of resources about normalization-by-evaluation (github.com/etiams)
3 points
etiams
a year ago
discuss
More