Search: github.com/lechmazur | Heykuki News

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

1.

Benchmarking LLM social skills with an elimination game (github.com/lechmazur)

194 points

colonCapitalDee

a year ago

2.

LLM Hallucination Benchmark: R1, o1, o3-mini, Gemini 2.0 Flash Think Exp 01-21 (github.com/lechmazur)

17 points

a year ago

3.

Show HN: LLM Debate Benchmark (github.com/lechmazur)

9 points

2 months ago

4.

LLM Persuasion Benchmark: Multi-Turn Persuasion Between Models (github.com/lechmazur)

9 points

2 months ago

5.

Show HN: Bazaar – a new LLM benchmark for economic reasoning under uncertainty (github.com/lechmazur)

8 points

10 months ago

6.

Show HN: LLM Creative Story‑Writing Benchmark V3 (github.com/lechmazur)

8 points

9 months ago

7.

Show HN: LLM Divergent Thinking Creativity Benchmark (github.com/lechmazur)

8 points

a year ago

8.

Multi-Agent Step Race Benchmark: LLM Collaboration and Deception Under Pressure (github.com/lechmazur)

7 points

a year ago

9.

Show HN: LLM Deceptiveness and Gullibility Benchmark (github.com/lechmazur)

7 points

2 years ago

10.

Show HN: Mapping LLM Style and Range in Flash Fiction (github.com/lechmazur)

7 points

9 months ago

11.

Emergent Price-Fixing by LLM Auction Agents (github.com/lechmazur)

7 points

a year ago

12.

Public Goods Game Benchmark: Contribute and Punish, a Multi-Agent Benchmark (github.com/lechmazur)

7 points

a year ago

13.

Show HN: Buyout Game Benchmark: Multi-Agent Bargaining, Transfers, and Takeovers (github.com/lechmazur)

6 points

2 months ago

14.

Show HN: LLM Round‑Trip Translation Benchmark (github.com/lechmazur)

6 points

9 months ago

15.

Pact: Head-to-head negotiation benchmark for LLMs (github.com/lechmazur)

6 points

9 months ago

16.

Show HN: LLM Thematic Generalization Benchmark (github.com/lechmazur)

6 points

a year ago

17.

LLM Confabulation (Hallucination) Leaderboard (github.com/lechmazur)

6 points

2 years ago

18.

Elimination Game: Multi-Agent LLM Social Reasoning, Strategy, and Deception (github.com/lechmazur)

5 points

a year ago

19.

Show HN: LLM Creative Story-Writing Benchmark (github.com/lechmazur)

5 points

a year ago

20.

Show HN: LLM Sycophancy Benchmark: Opposite-Narrator Contradictions (github.com/lechmazur)

3 points

3 months ago

21.

Elimination Game Benchmark: Social Reasoning, Strategy, and Deception in LLMs (github.com/lechmazur)

2 points

a year ago

22.

Step-Game: Assessing LLM Collaboration and Deception Under Pressure (github.com/lechmazur)

2 points

a year ago

23.

Accurately calculating the number of legal chess positions (github.com/lechmazur)

2 points

5 years ago

24.

LLM Position Bias Benchmark: Swapped-Order Pairwise Judging (github.com/lechmazur)

1 point

a month ago

25.

Benchmark that evaluates LLMs using 759 NYT Connections puzzles (github.com/lechmazur)

1 point

6 months ago

26.

NYT Connections LLM Benchmark (github.com/lechmazur)

1 point

6 months ago