Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Login
Top
New
Best
Ask
Show
Jobs
1.
▲
Show HN: O3 beats Sonnet 4 at coding (in our codebase, wrt our preferences)
2 points
kmckiern
10 months ago
discuss
2.
▲
Show HN: Mandoline – Custom LLM Evaluations for Real-World Use Cases
(mandoline.ai)
2 points
kmckiern
2 years ago
discuss
3.
▲
Refusals (LLM Leaderboard)
(mandoline.ai)
2 points
kmckiern
2 years ago
discuss
4.
▲
Comparing Refusal Behavior Across Top Language Models
(mandoline.ai)
2 points
kmckiern
2 years ago
discuss