Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
541.
▲
Litmus: LLM Testing and Evaluation Tool for AI App Development on Google Cloud
(github.com/google)
1 point
joburgalex
2 years ago
discuss
542.
▲
Llama Stack by Meta – Inference, Safety, Memory, Agentic System, Evaluation
(github.com/meta-llama)
1 point
vikrantrathore
2 years ago
discuss
543.
▲
Unibench: Vision-Language Model Evaluation
(github.com/facebookresearch)
1 point
zerojames
2 years ago
discuss
544.
▲
LLM Evaluation Methods
(github.com/alopatenko)
1 point
pltig
2 years ago
discuss
545.
▲
Show HN: Serializable infix expressions and a Python evaluator
(github.com/shrir)
1 point
sb13
2 years ago
discuss
546.
▲
FreeEval: A Framework for Trustworthy and Efficient Evaluation of LLMs
(github.com/WisdomShell)
1 point
PaulHoule
2 years ago
discuss
547.
▲
Llama.cpp: Improve CPU prompt eval speed
(github.com/ggerganov)
1 point
tosh
2 years ago
discuss
548.
▲
Evaluate LLMs in Real Time with Street Fighter III
(github.com/OpenGenerativeAI)
1 point
magoghm
2 years ago
discuss
549.
▲
Evaluating Claude 3 for Converting Screenshots to Code
(github.com/abi)
1 point
abi
2 years ago
discuss
550.
▲
Show HN: Hiring when you don't know exactly how to evalute candidates
(github.com/joelparkerhenderson)
1 point
jph
2 years ago
discuss
551.
▲
Multi-bitrate JPEG compression perceptual evaluation dataset 2023
(github.com/google-research)
1 point
ksec
2 years ago
discuss
552.
▲
Show HN: Lone Arena – Self-hosted LLM human evaluation, you be the judge
(github.com/Contextualist)
1 point
Contextualist
2 years ago
discuss
553.
▲
IFEval: Evaluator for LLMs
(github.com/Rohan2002)
1 point
simonpure
2 years ago
discuss
554.
▲
Genealogos takes outputs from Nix evaluation tools and produces SBoM files
(github.com/tweag)
1 point
ghuntley
2 years ago
discuss
555.
▲
Show HN: Open-source evaluations for web agents
(github.com/reworkd)
1 point
asim-shrestha
3 years ago
discuss
556.
▲
PhaseLLM Eval: run batch LLM jobs and evals via visual front-end (MIT licensed)
(github.com/wgryc)
1 point
cl42
3 years ago
discuss
557.
▲
Thudm/AgentBench: A Comprehensive Benchmark to Evaluate LLMs as Agents
(github.com/THUDM)
1 point
freediver
3 years ago
discuss
558.
▲
AgentBench: A Comprehensive Benchmark to Evaluate LLMs as Agents
(github.com/THUDM)
1 point
swyx
3 years ago
discuss
559.
▲
Evaluate Multiple LLMs Easily
(github.com/ray-project)
1 point
fzliu
3 years ago
discuss
560.
▲
Show HN: ChainForge, a visual tool for evaluating LLM responses
(github.com/ianarawjo)
1 point
fatso784
3 years ago
discuss
561.
▲
Lazy evaluation and infinite streams in C++
(github.com/apresta)
1 point
jimmy2times
14 years ago
discuss
562.
▲
Git-REPL: A Git REPL (read-eval-print loop) courtesy of rlwrap
(github.com/jcsalterego)
1 point
jcsalterego
3 years ago
discuss
563.
▲
Show HN: Made a simple parser/evaluator of arithmetic expressions in Python
(github.com/beyonddream)
1 point
beyonddream
3 years ago
discuss
564.
▲
A safe eval library based on WebAssembly and Duktape/QuickJS
(github.com/maple3142)
1 point
tosh
4 years ago
discuss
565.
▲
Feature request: add PR curve in TensorFlow object detection API / eval.p
(github.com/tensorflow)
1 point
prakashqwerty
4 years ago
discuss
566.
▲
Latte: Evaluation Framework for Disentangled Latent Spaces
(github.com/karnwatcharasupat)
1 point
lerch
4 years ago
discuss
567.
▲
Show HN: Rues an Expression Evaluation Sidecar
(github.com/maxpert)
1 point
maxpert
4 years ago
discuss
568.
▲
Rues Is a Expression Evaluation as Service
(github.com/maxpert)
1 point
todsacerdoti
4 years ago
discuss
569.
▲
Show HN: A generic policy and constraint evaluator (Go)
(github.com/tadasv)
1 point
vptr
5 years ago
discuss
570.
▲
acorn-macros – Evaluates and replaces JavaScript macros with Acorn
(github.com/heyheyhello)
1 point
ducaale
5 years ago
discuss
More