I am impressed by services as deepl or ‘in place’ ocr, translation and layout recovery done by Safari, Google Translate (Camera mode) etc.
But these methods are not very practical for lets say 500-1000 A4 scans (of low quality print, with widely varying layouts).
I have been exploring the standard tools (tesseract with bounding box finding etc) but with very bad results compared to the (cloud/closed) services and apps mentioned above.
As mentioned in the title, I would like to run this locally, for mostly practical reasons (no need to upload gigabytes of scans (images) to some service).
I found a promising approach here[0], but sadly cannot get it to run at all.
In the age of computer vision, machine learning, LLM’s do you know of any tools, approaches, technologies and techniques that could help accomplish my goal to translate my scans in a way that the (widely varying) layout is preserved?
[0]https://github.com/phkhanhtrinh23/translation_layoutrecovery