Using Tabula to extract table - Mixing rows and columns

53 views Asked by At

I am trying to use python tabula to extract a table from a PDF. I have use Tabula app, to generate a template. In the app, the output seems work as below:

Selection area using Tabula

Preview from Tabula

I have used the area extracted from tabula template. Code below:

y1 = 77.0376069164276
x1 = 23.662381164550744
y2 = y1 + 732.1944360351562
x2 = x1 + 546.9135269165039
dfs2 = tabula.read_pdf(input_path="C:\\Users\\pedro\\Downloads\\BDI_00_20231228.pdf", 
                        pandas_options={'header': None}, 
                        pages=[612], 
                        guess=False, 
                        area=[[y1,x1,y2,x2]])
dfs2[0]

See below the output, the columns seem to be mixed with some lines:

Tabula-py Table - Dataframe

0

There are 0 answers