So far, I have scrubbed the HN SWE-bench articles from the last 9 months, and tried Google scholar searches for LLM based refactoring with Proofs or DSLs. I am confident that they mention SWE-bench in the abstract, but it does not seem to be mentioned in the HN announcement. I don't remember if it was a custom DSL, or if they used a well known verification language. There's a chance they did not call it a DSL.
Closest matches so far:
1. https://arxiv.org/pdf/2305.17485. Generates refactorings with proofs for a simple prolog-like language, but does not generalize to SWE-bench tasks
2. https://arxiv.org/abs/2407.21787. Filters hundreds of DeepSeek-V2-Coder-Instruct solutions to get 56% on SWE-bench Lite, but does not use equivalence proofs
3. https://arxiv.org/pdf/2402.08699. Uses round trip error with a neural equivalence heuristic, but does not generate equivalence proofs
4. Generates controller and predicate code from a Temporal Logic spec in TSL, but does not generate the TSL spec or let us refactor code.
I appreciate any help!