Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
1.
Show HN: O3 beats Sonnet 4 at coding (in our codebase, wrt our preferences)
2 points
kmckiern
10 months ago
discuss
2.
Show HN: Mandoline – Custom LLM Evaluations for Real-World Use Cases (mandoline.ai)
2 points
kmckiern
2 years ago
discuss
3.
Refusals (LLM Leaderboard) (mandoline.ai)
2 points
kmckiern
2 years ago
discuss
4.
Comparing Refusal Behavior Across Top Language Models (mandoline.ai)
2 points
kmckiern
2 years ago
discuss