Lee Meador Lee Meador - 1 month ago 32
Java Question

Can duplicating a pdf with PDFBox be small like with iText?

I am reading in a PDF and outputting a PDF with multiple copies of the original PDF in it. I test by doing the same thing for both PDFBox and iText. iText creates a much smaller output if I duplicate each page individually.

The question: Is there another way to do this in PDFBox that results in smaller output PDFs.

For one example input file, generating two copies to the output with both tools:


  • Original PDF size: 30K

  • PDFBox (v 1.7.1) generated PDF: 84K

  • iText (v 5.3.4) generated PDF: 35K



Java code for PDFBox (sorry to inflict error handling on you). Notice how it reads the input over and over and duplicates it as a whole:

PDFMergerUtility merger = new PDFMergerUtility();
PDDocument workplace = null;
try {
for (int cnt = 0; cnt < COPIES; ++cnt) {
PDDocument document = null;
InputStream stream = null;
try {
stream = new FileInputStream(new File(sourceFileName));
document = PDDocument.load(stream);
if (workplace == null) {
workplace = document;
} else {
merger.appendDocument(workplace, document);
}
} finally {
if (document != null && document != workplace) {
document.close();
}
if (stream != null) {
stream.close();
}
}
}

OutputStream out = null;
try {
out = new FileOutputStream(new File(destinationFileName));
workplace.save(out);
} finally {
if (out != null) {
out.close();
}
}
} catch (COSVisitorException e1) {
e1.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (workplace != null) {
try {
workplace.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}


Code to do it with iText. Notice how it loads the input file page by page and transfers each page to the output:

Document document = null;
PdfReader reader = null;
InputStream inputStream = null;
FileOutputStream outputStream = null;
try {
inputStream = new FileInputStream(new File(sourceFileName));
outputStream = new FileOutputStream(new File(destinationFileName));
document = new Document();
PdfCopy copy = new PdfSmartCopy(document, outputStream);
document.open();
reader = new PdfReader(inputStream);
// loop over the pages in that document
int pdfPageNo = reader.getNumberOfPages();
for (int page = 0; page < pdfPageNo;) {
PdfImportedPage onePage = copy.getImportedPage(reader, ++page);
// duplicate each page N times
for (int i = 0; i < COPIES; ++i) {
copy.addPage(onePage);
}
}
copy.freeReader(reader);
} catch (DocumentException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (reader != null) {
reader.close();
}
if (document != null) {
document.close();
}
try {
if (inputStream != null) {
inputStream.close();
}
if (outputStream != null) {
outputStream.close();
}
} catch (IOException e) {
// do nothing
}
}


Both are surrounded by this:

public class Duplicate {

/** The original PDF file */
private static final String sourceFileName = "PDF_CI_US2CA.pdf";

/** The resulting PDF file. */
private static final String destinationFileName = "itext_output.pdf";
private static final int COPIES = 2;

public static void main(String[] args) {
...
}
}

Answer

Using the following solution, I was able to create a PDF file with many duplicate pages and have a minimal impact on storage.

PDDocument samplePdf = null;
try {
    samplePdf = PDDocument.load(PDF_PATH);
    PDPage page = (PDPage) samplePdf.getDocumentCatalog().getAllPages().get(0); 

    for(int i = 0; i < COPIES; i++) {
        samplePdf.importPage(page);
    }

    samplePdf.save(SAVE_PATH); //$NON-NLS-1$

} catch (IOException e) {
    e.printStackTrace();
} catch (COSVisitorException e) {
    e.printStackTrace();
}

In my first attempt I used, samplePdf.addPage(page) but it didn't work as expected. So obviously there is a difference between the add and import functions. I'll have to check the source or documentation to see why. Anyway, this should help you devise a solution for your needs with PDFBox.