I've designed a benchmark to evaluate the performance of different LLMs against high-quality representative competitive coding problems sourced from cses.fi. For now, I have benchmarked some of the SOTA models, including Claude 3.5 Sonnet and GPT-4o, and created initial visualizations and evaluations of the data. Suggestions are encouraged : ).
Show HN: Claude 3.5 Sonnet beats GPT-4o at Competitive Programming | Heykuki News