Search: github.com/evalstate | Heykuki News

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

91.

Thudm/AgentBench: A Comprehensive Benchmark to Evaluate LLMs as Agents (github.com/THUDM)

1 point

3 years ago

92.

AgentBench: A Comprehensive Benchmark to Evaluate LLMs as Agents (github.com/THUDM)

1 point

3 years ago

93.

Evaluate Multiple LLMs Easily (github.com/ray-project)

1 point

3 years ago

94.

acorn-macros – Evaluates and replaces JavaScript macros with Acorn (github.com/heyheyhello)

1 point

5 years ago

95.

Show HN: Evaluate Markdown code blocks within Vim (github.com/gpanders)

1 point

6 years ago

96.

Show HN: Little tool to evaluate your cryptocurrency trades on Poloniex (github.com/enricobacis)

1 point

9 years ago

97.

Seeing Is Believing: Evaluates Ruby code, recording the results of each line (github.com/JoshCheek)

1 point

9 years ago

98.

Go proposal: clarify how proposals are evaluated (github.com/golang)

1 point

10 years ago

99.

Evaluate JavaScript code blocks from within markdown (github.com/reggi)

1 point

11 years ago

100.

Ask HN: How do you familiarize yourself with a new codebase?

407 points

11 years ago

101.

Show HN: Faster FastAPI with simdjson and io_uring on Linux 5.19 (github.com/unum-cloud)

290 points

3 years ago

102.

Show HN: EVA – AI-Relational Database System (github.com/georgia-tech-db)

237 points

3 years ago

103.

Show HN: Open-source, browser-local data exploration using DuckDB-WASM and PRQL (github.com/pretzelai)

227 points

2 years ago

104.

Show HN: Mizu.js – Lightweight HTML templating library for any-side rendering (mizu.sh)

225 points

a year ago

105.

Show HN: Qwen-2.5-32B is now the best open source OCR model (github.com/getomni-ai)

211 points

a year ago

106.

Show HN: PromptTools – open-source tools for evaluating LLMs and vector DBs (github.com/hegelai)

211 points

3 years ago

107.

Less Slow C++ (github.com/ashvardanian)

198 points

a year ago

108.

Show HN: cmux - Ghostty-based terminal with vertical tabs and notifications (github.com/manaflow-ai)

198 points

4 months ago

109.

Show HN: Use Code Llama as Drop-In Replacement for Copilot Chat (continue.dev)

187 points

3 years ago

110.

Show HN: Benchmarking VLMs vs. Traditional OCR (getomni.ai)

146 points

a year ago

111.

Show HN: Quality News – Towards a fairer ranking algorithm for Hacker News (news.social-protocols.org)

140 points

3 years ago

112.

Show HN: Statewright – Visual state machines that make AI agents reliable (github.com/statewright)

126 points

25 days ago

113.

Show HN: Terminal-Bench-RL: Training long-horizon terminal agents with RL (github.com/Danau5tin)

125 points

10 months ago

114.

Show HN: Ragas – Open-source library for evaluating RAG pipelines (github.com/explodinggradients)

121 points

2 years ago

115.

Launch HN: Confident AI (YC W25) – Open-source evaluation framework for LLM apps

117 points

a year ago

116.

Show HN: Improve LLM Performance by Maximizing Iterative Development (github.com/palico-ai)

104 points

2 years ago

117.

Launch HN: Grai (YC S22) – Open-Source Data Observability Platform

101 points

ersatz_username

3 years ago

118.

Show HN: Dittofeed – 1-Click deploy, self-host Mailchimp alternative (github.com/dittofeed)

100 points

3 years ago

119.

Show HN: Burr – A framework for building and debugging GenAI apps faster (github.com/DAGWorks-Inc)

94 points

2 years ago

120.

Show HN: Gambit, an open-source agent harness for building reliable AI agents (github.com/bolt-foundry)

91 points

5 months ago