Benben Benben - 4 months ago 22
Java Question

How to extract text from a PDF file with Apache PDFBox

I would like to extract text from a given PDF file with Apache PDFBox.
I wrote the code.

PDFTextStripper pdfStripper = null;
PDDocument pdDoc = null;
COSDocument cosDoc = null;
File file = new File(filepath);

PDFParser parser = new PDFParser(new FileInputStream(file));
parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new PDFTextStripper();
pdDoc = new PDDocument(cosDoc);
pdfStripper.setStartPage(1);
pdfStripper.setEndPage(5);
String parsedText = pdfStripper.getText(pdDoc);
System.out.println(parsedText);


However, I got the following error.

Exception in thread "main" java.lang.NullPointerException
at org.apache.fontbox.afm.AFMParser.main(AFMParser.java:304)


I added pdfbox-1.8.5.jar and fontbox-1.8.5.jar on the the buildpath.

=====edit=====

I added
System.out.println("program starts");

in the beginning of the program.

I ran it, then I got the same error as mentioned above and
program starts
did not appear in the console.
Thus, I think I have a problem at the classpath or something.

Thank you.

Answer

I run your program and it's work correctly for me. Maybe your problem is related to FilePath that you given to file. I put my pdf in C drive and hardcoded the file path.here is my code:

public class PDFReader{
    public static void main(String args[]) {
        PDFTextStripper pdfStripper = null;
        PDDocument pdDoc = null;
        COSDocument cosDoc = null;
        File file = new File("C:/my.pdf");
        try {
            PDFParser parser = new PDFParser(new FileInputStream(file));
            parser.parse();
            cosDoc = parser.getDocument();
            pdfStripper = new PDFTextStripper();
            pdDoc = new PDDocument(cosDoc);
            pdfStripper.setStartPage(1);
            pdfStripper.setEndPage(5);
            String parsedText = pdfStripper.getText(pdDoc);
            System.out.println(parsedText);
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } 
    }
}