There are popular data extraction tools and OCR tools in the market. None of them are open source.
Most accounting firms, law firms, insurance, back office, and real-estate folks would like to use a tool like this.
You can add PDF documents, Images, and audio files and create columns to answer questions on documents or extract information into tabular format.
Access to repo: https://github.com/harishdeivanayagam/rowfill Screenshots: https://github.com/harishdeivanayagam/rowfill/tree/master/sc...
Sample use-case: You can upload your invoices or receipts, create custom columns, and click Start Run. The data would be automatically extracted.