It proved to be quite useful for the use-cases I've worked on since with grading notes you can leave small details on around domain concepts that the LLMs make mistakes on rather than have a full answer which consumes a lot more time labeling time.
I'd like to learn more if such an approach or similar has been useful for others too.