How to extract text from PDF?

Alexandre Aragão

I'm creating a React Application with NodeJS and it needs to get some text from a PDF that the user upload.

I already tried to use: pdf-parse, pdf2json, pdf.js and react-pdf-js. The file should be selected by the user, and all those libraries use a Path to acess the file. What should I do? PS1: I'm using a input type='file' button to get the file.

The code must work both NodeJS and Web Browser

Alexandre Aragão

I'm answering my own question. First I create a regular html input.

<input type='file'/>

Im using React, so I use onChange attribute in place of id. So, when the user enters with the file, a function is activated and I use the follwing code to get the file:

const file = event.target.files[0];

file not has a path, wich is used by PDF.JS to get the real file. Then I use a FileReader to convert the file int a Array of bits (I guess):

const fileReader = new FileReader();

Then we set a function at fileReader.onload the function can be foundend here

fileReader.onload = function() {...}

Finally we do this:

fileReader.readAsArrayBuffer(file);

Important PS: pdf.pdfInfo must be replaced with pdf at new PDF.JS versions.

Thanks for helping.

Extra PS: To use pdfjsLib as PDFJS in React I did this in index.html file:

window.PDFJS = pdfjsLib

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related