Some pages are skipped when using pdf-parse to extract text from a PDF

83 views Asked by dealwap At 08 February 2024 at 16:14

I'm currently using pdf-parse, a Node.js library to extract text from a PDF file. However, I've encountered an issue where certain pages are being skipped during the extraction process. I've checked the PDF file, and it doesn't seem to be encrypted or corrupted. Also, when I use my mac PDF viewer, it shows the missing page to be searchable and not scanned.

Here's the code I'm using:

const pdfParse = require('pdf-parse');
const fs = require('fs');

// Read the PDF file
const pdfPath = 'path/to/pdf/file.pdf';
const pdfBuffer = fs.readFileSync(pdfPath);

// Parse the PDF
pdfParse(pdfBuffer).then((data) => {
  console.log(data.text);
}).catch((error) => {
  console.error('An error occurred:', error);
});

Despite running the above code, certain pages from the PDF are skipped during text extraction. I'm wondering what could be causing this issue and how I can ensure that all pages are properly parsed by pdf-parse. Any insights or suggestions on resolving this problem would be greatly appreciated. Thank you!

I tried to extract the text from all the pages from the PDF with the above code but I noticed some pages are skipped. I expect all the pages in the PDF to be extracted and I should get back the corresponding texts.

Original Q&A

TechQA.

Some pages are skipped when using pdf-parse to extract text from a PDF

There are 0 answers

Related Questions in JAVASCRIPT

Related Questions in JQUERY

Related Questions in NODE.JS

Related Questions in PDF

Related Questions in PDF.JS

Popular Questions

Trending Questions