I've been wondering for a while if LLMs perform better on languages that are more restrictive than Python out-of-the-box. Is anyone using Claude, GPT, Cursor, Aider etc for ADA or functional languages and do they create less errors than code generated in Python, even if it's subjective? The question is a little inspired by this paper: https://arxiv.org/pdf/2410.01215 FROM CODE TO CORRECTNESS: CLOSING THE LAST
MILE OF CODE GENERATION WITH HIERARCHICAL
DEBUGGING. Sometimes LLMs are fantastic and sometimes it feels like a battle to get simple things done or pick up where I left off. I think I'm trying to get a feeling for how much the errors might be a funtion of a languages flexibility to do things in different ways vs the quality of the LLM or whether tools like these in the paper will be needed for future tooling improvements.