I have 200 PDF files, all formatted similarly.
Currently I am opening each PDF and looking for the two relevant values and typing them into an Excel table, all manually.
I'm wondering if there is a way to automate this. My (non-IT background) idea is to write a program that OCR scans all the files located in a folder, and then finds and extracts the relevant data in CVS format, and transfers it to Excel.
I was wondering if anyone could give me some pointers on how to first approach this. Is something remotely similar possible at all? Is there a language that's better suited for this task than the other? Would VBA or PowerQuery be in any way helpful to this task?
OCR
For the OCR part there are tons of tools, just to name a few popular ones:
There are many documents on how to install these tools , unfortunately mostly for linux
Relevant Data
A good question .. but a non-detailed one ( and you may get downvotes because you did not tell whether you need tables extracted or just text )
Of course you can use any programming language , an easy approach would be OCR to single files , then e.g.
grep -l MYTERM myfileswill yield the filenames (linux, or git bash under windows ),any finally generate a CSV that you import to excel( easy approach) or find a way to generate "real" Excel files.
Regards