Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
541.
Litmus: LLM Testing and Evaluation Tool for AI App Development on Google Cloud (github.com/google)
1 point
joburgalex
2 years ago
discuss
542.
Llama Stack by Meta – Inference, Safety, Memory, Agentic System, Evaluation (github.com/meta-llama)
1 point
vikrantrathore
2 years ago
discuss
543.
Unibench: Vision-Language Model Evaluation (github.com/facebookresearch)
1 point
zerojames
2 years ago
discuss
544.
LLM Evaluation Methods (github.com/alopatenko)
1 point
pltig
2 years ago
discuss
545.
Show HN: Serializable infix expressions and a Python evaluator (github.com/shrir)
1 point
sb13
2 years ago
discuss
546.
FreeEval: A Framework for Trustworthy and Efficient Evaluation of LLMs (github.com/WisdomShell)
1 point
PaulHoule
2 years ago
discuss
547.
Llama.cpp: Improve CPU prompt eval speed (github.com/ggerganov)
1 point
tosh
2 years ago
discuss
548.
Evaluate LLMs in Real Time with Street Fighter III (github.com/OpenGenerativeAI)
1 point
magoghm
2 years ago
discuss
549.
Evaluating Claude 3 for Converting Screenshots to Code (github.com/abi)
1 point
abi
2 years ago
discuss
550.
Show HN: Hiring when you don't know exactly how to evalute candidates (github.com/joelparkerhenderson)
1 point
jph
2 years ago
discuss
551.
Multi-bitrate JPEG compression perceptual evaluation dataset 2023 (github.com/google-research)
1 point
ksec
2 years ago
discuss
552.
Show HN: Lone Arena – Self-hosted LLM human evaluation, you be the judge (github.com/Contextualist)
1 point
Contextualist
2 years ago
discuss
553.
IFEval: Evaluator for LLMs (github.com/Rohan2002)
1 point
simonpure
2 years ago
discuss
554.
Genealogos takes outputs from Nix evaluation tools and produces SBoM files (github.com/tweag)
1 point
ghuntley
2 years ago
discuss
555.
Show HN: Open-source evaluations for web agents (github.com/reworkd)
1 point
asim-shrestha
3 years ago
discuss
556.
PhaseLLM Eval: run batch LLM jobs and evals via visual front-end (MIT licensed) (github.com/wgryc)
1 point
cl42
3 years ago
discuss
557.
Thudm/AgentBench: A Comprehensive Benchmark to Evaluate LLMs as Agents (github.com/THUDM)
1 point
freediver
3 years ago
discuss
558.
AgentBench: A Comprehensive Benchmark to Evaluate LLMs as Agents (github.com/THUDM)
1 point
swyx
3 years ago
discuss
559.
Evaluate Multiple LLMs Easily (github.com/ray-project)
1 point
fzliu
3 years ago
discuss
560.
Show HN: ChainForge, a visual tool for evaluating LLM responses (github.com/ianarawjo)
1 point
fatso784
3 years ago
discuss
561.
Lazy evaluation and infinite streams in C++ (github.com/apresta)
1 point
jimmy2times
14 years ago
discuss
562.
Git-REPL: A Git REPL (read-eval-print loop) courtesy of rlwrap (github.com/jcsalterego)
1 point
jcsalterego
3 years ago
discuss
563.
Show HN: Made a simple parser/evaluator of arithmetic expressions in Python (github.com/beyonddream)
1 point
beyonddream
3 years ago
discuss
564.
A safe eval library based on WebAssembly and Duktape/QuickJS (github.com/maple3142)
1 point
tosh
4 years ago
discuss
565.
Feature request: add PR curve in TensorFlow object detection API / eval.p (github.com/tensorflow)
1 point
prakashqwerty
4 years ago
discuss
566.
Latte: Evaluation Framework for Disentangled Latent Spaces (github.com/karnwatcharasupat)
1 point
lerch
4 years ago
discuss
567.
Show HN: Rues an Expression Evaluation Sidecar (github.com/maxpert)
1 point
maxpert
4 years ago
discuss
568.
Rues Is a Expression Evaluation as Service (github.com/maxpert)
1 point
todsacerdoti
4 years ago
discuss
569.
Show HN: A generic policy and constraint evaluator (Go) (github.com/tadasv)
1 point
vptr
5 years ago
discuss
570.
acorn-macros – Evaluates and replaces JavaScript macros with Acorn (github.com/heyheyhello)
1 point
ducaale
5 years ago
discuss
More