Heykuki News

TopNewBestAskShowJobs
TopNewBestAskShowJobs
1.
Benchmarking LLM social skills with an elimination game (github.com/lechmazur)
194 points
colonCapitalDee
a year ago
60 comments
2.
LLM Hallucination Benchmark: R1, o1, o3-mini, Gemini 2.0 Flash Think Exp 01-21 (github.com/lechmazur)
17 points
zone411
a year ago
3 comments
3.
Show HN: LLM Debate Benchmark (github.com/lechmazur)
9 points
zone411
2 months ago
3 comments
4.
LLM Persuasion Benchmark: Multi-Turn Persuasion Between Models (github.com/lechmazur)
9 points
zone411
2 months ago
discuss
5.
Show HN: Bazaar – a new LLM benchmark for economic reasoning under uncertainty (github.com/lechmazur)
8 points
zone411
10 months ago
1 comment
6.
Show HN: LLM Creative Story‑Writing Benchmark V3 (github.com/lechmazur)
8 points
zone411
9 months ago
discuss
7.
Show HN: LLM Divergent Thinking Creativity Benchmark (github.com/lechmazur)
8 points
zone411
a year ago
discuss
8.
Multi-Agent Step Race Benchmark: LLM Collaboration and Deception Under Pressure (github.com/lechmazur)
7 points
zone411
a year ago
1 comment
9.
Show HN: LLM Deceptiveness and Gullibility Benchmark (github.com/lechmazur)
7 points
zone411
2 years ago
1 comment
10.
Show HN: Mapping LLM Style and Range in Flash Fiction (github.com/lechmazur)
7 points
zone411
9 months ago
discuss
11.
Emergent Price-Fixing by LLM Auction Agents (github.com/lechmazur)
7 points
zone411
a year ago
discuss
12.
Public Goods Game Benchmark: Contribute and Punish, a Multi-Agent Benchmark (github.com/lechmazur)
7 points
zone411
a year ago
discuss
13.
Show HN: Buyout Game Benchmark: Multi-Agent Bargaining, Transfers, and Takeovers (github.com/lechmazur)
6 points
zone411
2 months ago
discuss
14.
Show HN: LLM Round‑Trip Translation Benchmark (github.com/lechmazur)
6 points
zone411
9 months ago
discuss
15.
Pact: Head-to-head negotiation benchmark for LLMs (github.com/lechmazur)
6 points
zone411
9 months ago
discuss
16.
Show HN: LLM Thematic Generalization Benchmark (github.com/lechmazur)
6 points
zone411
a year ago
discuss
17.
LLM Confabulation (Hallucination) Leaderboard (github.com/lechmazur)
6 points
zone411
2 years ago
discuss
18.
Elimination Game: Multi-Agent LLM Social Reasoning, Strategy, and Deception (github.com/lechmazur)
5 points
zone411
a year ago
discuss
19.
Show HN: LLM Creative Story-Writing Benchmark (github.com/lechmazur)
5 points
zone411
a year ago
discuss
20.
Show HN: LLM Sycophancy Benchmark: Opposite-Narrator Contradictions (github.com/lechmazur)
3 points
zone411
3 months ago
discuss
21.
Elimination Game Benchmark: Social Reasoning, Strategy, and Deception in LLMs (github.com/lechmazur)
2 points
amichail
a year ago
discuss
22.
Step-Game: Assessing LLM Collaboration and Deception Under Pressure (github.com/lechmazur)
2 points
amichail
a year ago
discuss
23.
Accurately calculating the number of legal chess positions (github.com/lechmazur)
2 points
slyall
5 years ago
discuss
24.
LLM Position Bias Benchmark: Swapped-Order Pairwise Judging (github.com/lechmazur)
1 point
zone411
a month ago
discuss
25.
Benchmark that evaluates LLMs using 759 NYT Connections puzzles (github.com/lechmazur)
1 point
ShrugLife
6 months ago
discuss
26.
NYT Connections LLM Benchmark (github.com/lechmazur)
1 point
cainxinth
6 months ago
discuss