Search: github.com/eval | Heykuki News

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

61.

Clojure protected REPL (github.com/stacksideflow)

3 points

7 years ago

62.

Evaluate Selections in Sublime Text (github.com/jbrooksuk)

2 points

13 years ago

63.

Estonia publishes its e-voting source code (github.com/vvk-ehk)

2 points

13 years ago

64.

Evaluation of Various MLX Quantizations (github.com/deepsweet)

2 points

18 days ago

65.

Should we chaos test our agents? (github.com/Corbell-AI)

2 points

21 days ago

66.

Open-source LLM-as-judge eval suite with root cause analysis and failure mining (github.com/colingfly)

2 points

3 months ago

67.

Evaluating LLMs with CommonGen-Lite (github.com/allenai)

2 points

2 years ago

68.

Evals Skills for AI Agents (github.com/latitude-dev)

2 points

a month ago

69.

Show HN: Claude Code skills for building LLM evals (github.com/latitude-dev)

2 points

a month ago

70.

Show HN: LLM‑Simple‑Eval – Easily Benchmark LLMs for Your Use Case (github.com/grigio)

2 points

9 months ago

71.

Evaluating Large Language Models Using LLM-as-a-Judge (github.com/aws-samples)

2 points

2 years ago

72.

GPT-4-turbo-2024-04-09 "wins" simple evals benchmark (github.com/openai)

2 points

2 years ago

73.

A survey on evaluation of large language models (github.com/MLGroupJLU)

2 points

3 years ago

74.

OpenFF – Automated estimation of physical properties (github.com/openforcefield)

2 points

5 years ago

75.

Show HN: IR_evaluation – Information retrieval evaluation metrics in pure Python (github.com/plurch)

1 point

a year ago

76.

Pulze AI Evals (github.com/pulzeai-oss)

1 point

a year ago

77.

BSD_Evals: Open-source LLM evaluation tool (github.com/brettdidonato)

1 point

2 years ago

78.

AgentSafeLabs – Launched Open-source Security framework for AI agents (github.com/AgentSafeLabs)

1 point

9 days ago

79.

Show HN: EleutherAI / Lm-Evaluation-Harness (github.com/EleutherAI)

1 point

23 days ago

80.

Webgrid Eval: LLM vision + tool-use on Neuralink's cursor control task (github.com/ofou)

1 point

3 months ago

81.

Network Evaluation Service (github.com/hendemic)

1 point

a year ago

82.

OpenAI: Simple-Evals (github.com/openai)

1 point

2 years ago

83.

ReactEval: Evaluating LLMs on front-end code generation (github.com/gitwitorg)

1 point

2 years ago

84.

Language Model Evaluation Harness (github.com/EleutherAI)

1 point

3 years ago

85.

Nextdoor's Cloud Security Posture Management (CSPM) Evaluation Matrix (github.com/Nextdoor)

1 point

3 years ago

86.

Show HN: EvalGPT – Code interpreter and agent framework inspired by Google Borg (github.com/index-labs)

1 point

3 years ago

87.

Trait-Eval – Rust (github.com/doctorn)

1 point

6 years ago

88.

Show HN: Little tool to evaluate your cryptocurrency trades on Poloniex (github.com/enricobacis)

1 point

9 years ago

89.

Show HN: Freeact – A Lightweight Library for Code-Action Based Agents (github.com/gradion-ai)

122 points

a year ago

90.

Show HN: Ellipsis – Automated PR reviews and bug fixes (ellipsis.dev)

121 points

2 years ago