Search: github.com/eval | Heykuki News

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

511.

Show HN: Orangensaft – A mini Python-like language with LLM eval in lang runtime (github.com/jargnar)

1 point

4 months ago

512.

Show HN: Praetorian Guard – Free AI tool to self-evaluate your CV (educational) (github.com/simonesan-afk)

1 point

4 months ago

513.

MiRAGE: Open-source framework for multimodal RAG evaluation

1 point

4 months ago

514.

The Vocabulary Priming Confound in LLM Evaluation [pdf] (github.com/Palmerschallon)

1 point

4 months ago

515.

Open source agents to evaluate, debug, and optimize your prompts (github.com/comet-ml)

1 point

5 months ago

516.

Simboba: Evals for your AI product in under 5 mins (github.com/ntkris)

1 point

5 months ago

517.

Live-trade-bench: Live evaluation of trading agents (github.com/ulab-uiuc)

1 point

5 months ago

518.

Show HN: Dokimos – LLM evaluation framework for Java (github.com/dokimos-dev)

1 point

5 months ago

519.

Benchmark that evaluates LLMs using 759 NYT Connections puzzles (github.com/lechmazur)

1 point

6 months ago

520.

Show HN: smallevals – Local LLM Evaluation Framework with Tiny 0.6B Models (github.com/mburaksayici)

1 point

6 months ago

521.

Open source LLM prompt eval and optimization CLI (github.com/davismartens)

1 point

6 months ago

522.

Show HN: StructEval - a structured output evaluation and comparison tool (github.com/jhiker)

1 point

7 months ago

523.

Rogue – The AI Agent Evaluator (github.com/qualifire-dev)

1 point

7 months ago

524.

Show HN: Local RAG Eval Harness – reproducible benchmarksfor retrieval pipelines

1 point

myroslavmokhamm

8 months ago

525.

TinyExpr: Parser, compiler, and evaluation engine for math expressions (github.com/codeplea)

1 point

8 months ago

526.

Benchmark code for evaluating different ASR packages and APIs (github.com/huggingface)

1 point

9 months ago

527.

Show HN: PromptDev – Prompt eval and testing for AI agents across providers (github.com/artefactop)

1 point

9 months ago

528.

numexpr: fast numerical array expression evaluator for Python (github.com/pydata)

1 point

10 months ago

529.

Quality and Safety Evaluations for AI Agents on Azure (github.com/aymenfurter)

1 point

10 months ago

530.

Show HN: Hypersigil – Prompt management UI – test, evaluate, deploy (github.com/hypersigilhq)

1 point

10 months ago

531.

Safe-MCP: Security Analysis Framework for Evaluation of Model Context Protocol (github.com/fkautz)

1 point

10 months ago

532.

RawBench: A minimal prompt evaluation framework (github.com/0xsomesh)

1 point

a year ago

533.

Assayer: Python-RQ watchdog for ML model checkpoint monitoring and evaluation (github.com/amoudgl)

1 point

a year ago

534.

Show HN: Digit-Class Prime Product Framework (Prime Factorization Evals for LMs) (github.com/arthurcolle)

1 point

a year ago

535.

E2E LLM evals, with less focus on metrics and more focus on binary assertions (github.com/openchatai)

1 point

a year ago

536.

Ask HN: What RAG evaluations do you care about?

1 point

a year ago

537.

NoLiMa: Long-Context Evaluation Beyond Literal Matching (github.com/adobe-research)

1 point

a year ago

538.

Evaluating and Training Multi-Modal Large Language Models for Action Recognition (github.com/AdaptiveMotorControlLab)

1 point

a year ago

539.

An Implementation of Eval() for Rust (github.com/evcxr)

1 point

a year ago

540.

I built a Python pipeline to evaluate the Exosome Complex in AlphaFold &CombFold (github.com/christopheragnus)

1 point

christopher8827

2 years ago