Borderless pdf extraction to json is not working properly for Python camelot library

Question

Borderless pdf extraction to json is not working properly for Python camelot library

414 views Asked by Goutam Ghosh At 24 September 2020 at 10:57

Can anyone give me quick answer/help that as we are facing some issue after pdf extraction to json using python camelot is not giving exact content. some content is missing after extraction.

Original Q&A

There are 1 answers

**Stefano Fiorucci - anakin87** · Answer 1 · 2020-09-24T13:40:21+00:00

I tried the following code:

import camelot

pdf_path = '/YOUR/FILEPATH.pdf'
tables = camelot.read_pdf(pdf_path, flavor='stream')

Here are two problems:

headers font is not properly read, so you find strange characters like (cid:71)...
using flavor='lattice', the table isn't detected. Using flavor='stream', the table is detected, but the cells aren't properly detected.

At the moment, I think that Camelot can't properly extract this table. They are working on fixing the second problem (see this and this).

TechQA.

Borderless pdf extraction to json is not working properly for Python camelot library

There are 1 answers

Related Questions in PYTHON

Related Questions in PDF-EXTRACTION

Related Questions in PYTHON-CAMELOT

Popular Questions

Trending Questions