How to extract content of PDF in React.js?

11.8k views Asked by At

I am trying to load PDF file of my local storage then extract content in React.js without any backend.

I tried to find similar modules from google, but didn't find proper module yet. There are many node modules for parsing PDFs, and I can extract content of PDF in backend, but I am not sure we can use it in web browsers.

2

There are 2 answers

0
Pythoner On BEST ANSWER

I tried this, and pdfjs-dist was no longer functional. Instead, a better alternative to extract text from a PDF directly within React was react-pdftotext.

1. Install the library:

npm install react-pdftotext

2. Import the library:

import pdfToText from 'react-pdftotext'

3. Create an input field:

<input type="file" accept="application/pdf" onChange={extractText}/>

4. Prepare a function:

    function extractText(event) {
        const file = event.target.files[0]
        pdfToText(file)
            .then(text => console.log(text))
            .catch(error => console.error("Failed to extract text from pdf"))
    }

Finally, bringing it all together:

import pdfToText from 'react-pdftotext'


function extractText(event) {
    const file = event.target.files[0]
    pdfToText(file)
        .then(text => console.log(text))
        .catch(error => console.error("Failed to extract text from pdf"))
}

function PDFParserReact() {

    return (
        <div className="App">
            <header className="App-header">
                <input type="file" accept="application/pdf" onChange={extractText}/>
            </header>
        </div>
    );
}
export default PDFParserReact;
1
UHpi On

If you are building a react application then you can use "react-pdftotext" package to extract text from pdfs on browsers. For detail on how to use this package you can refer to this article.