Extracting data from multiple PDFs

Question

Extracting data from multiple PDFs

82 views Asked by maliebina At 03 March 2023 at 10:20

I have 200 PDF files, all formatted similarly.

Currently I am opening each PDF and looking for the two relevant values and typing them into an Excel table, all manually.

I'm wondering if there is a way to automate this. My (non-IT background) idea is to write a program that OCR scans all the files located in a folder, and then finds and extracts the relevant data in CVS format, and transfers it to Excel.

I was wondering if anyone could give me some pointers on how to first approach this. Is something remotely similar possible at all? Is there a language that's better suited for this task than the other? Would VBA or PowerQuery be in any way helpful to this task?

Original Q&A

There are 1 answers

**Benji over_9000 'benchonaut'** · Answer 1 · 2023-03-03T11:11:41+00:00

OCR

For the OCR part there are tons of tools, just to name a few popular ones:

There are many documents on how to install these tools , unfortunately mostly for linux

Relevant Data

A good question .. but a non-detailed one ( and you may get downvotes because you did not tell whether you need tables extracted or just text )

Of course you can use any programming language , an easy approach would be OCR to single files , then e.g. grep -l MYTERM myfiles will yield the filenames (linux, or git bash under windows ),

any finally generate a CSV that you import to excel( easy approach) or find a way to generate "real" Excel files.

Regards

TechQA.

Extracting data from multiple PDFs

There are 1 answers

OCR

Relevant Data

Related Questions in OCR

Related Questions in DATA-MANIPULATION

Related Questions in FILE-MANIPULATION

Popular Questions

Trending Questions