How can I extract text from a PDF document in a Flutter app?

315 views Asked by At

I am working on a Flutter application and need to extract text from PDF documents. I have attempted to use the pdf package, but I'm unable to do so as I can see only PdfDocumentParserBase which is an abstract class that is present.

I don't see any implementation of the same class.

Here is snippet from my code:

import 'dart:typed_data';
import 'package:pdf/pdf.dart';

void _processFileContents(html.File file) async {
  setState(() {
    _isProcessing = true;
    _processingText = "Please wait";
  });

  final reader = html.FileReader();
  reader.readAsArrayBuffer(file);
  await reader.onLoad.first;

  final List<int> bytes = reader.result as List<int>;

  // Process the contents of the file as needed
  final pdfDocument = PdfDocument.load(pdfDocumentParser); // I am not sure how to get this pdfDocumentParser

  // Extract text from each page of the PDF
  final List<String> pdfText = [];
  for (var page in pdfDocument.pages!) {
    pdfText.add(page.text!);
  }

  // Display or process extracted text as needed
  print(pdfText);

  setState(() {
      _isProcessing = false;
      _processingText = "";
      _isPdfProcessed = true;
  });
  });
}

I have already tried the pdf package as mentioned in the detail. I don't see much documentation in the internet and in the stackoverflow hence reaching out.

I have also checked similar question: How can I parse the contents of a PDF file in dart? There are no good answers in that thread too.

Can you please recommend me any other good package or the right way to use pdf package to extract text from a pdf document?

Any guidance or assistance with these issues would be immensely helpful. Thank you!

1

There are 1 answers

0
Michael On

After 24h of searching the web I finally found that package syncfusion_flutter_pdf: ^24.2.9 that perform exactly the desired feature of extracting text from pdf document. Here's how to use it with few lines of code :

Uint8List bytes = yourFile.readAsBytes();
final PdfDocument document = PdfDocument(inputBytes: bytes);
String content = PdfTextExtractor(document).extractText();
document.dispose();