PJP PJP - 3 months ago 21
R Question

Can I reduce pdf file size in knitR/ggplot2 when using a large dataset without using external tools?

I have a number of large-ish files which I am reading into R in an rmarkdown document, cleaning up, and plotting with ggplot2.

Most files are about 3Mb in size with around 80,000 lines of data, but some are 12Mb in size, with 318,406 lines of data (Time, Extension, Force).

Time,Extension,Load
(sec),(mm),(N)
"0.00000","0.00000","-4.95665"
"0.00200","0.00000","-4.95677"
"0.00400","0.00000","-4.95691"
"0.10400","-0.00040","-4.95423"


It takes a while to churn through the data and create the pdf file (that's OK), but the PDF file is now nearly 6Mb in size with about 16 graphs in there (in fact 3 graphs which are facet plots using ggplot2).

I understand that the pdf is including a line segment for every datapoint in my dataset, and therefore as I increase the number of graphs the amount of data in the file increases./ However, I don't forsee a requirement to drill down into the pdf document to see that level of detail, and I will have problems emailing it around as it approaches 10Mb).

If I convert pdf to ps using pdf2ps and then go back to pdf with ps2pdf, I get a file about 1/3 of the size of the original pdf, and the quality looks great.

Therefore is there a method from within R/knitR/ggplot2 to reduce the number of points plotted in the pdf images without using an external tool to compress the pdf file ? (or to somehow optimise the pdf generated ?)

Cheers
Pete

Answer

You can try changing the graphic device from pdf to png by adding

knitr::opts_chunk$set(dev = 'png')

to your setup chunk.

Or you can add this to your output header

output:
  pdf_document:
    dev: png

Try different devices (png, jpg). Maybe this will change the size

Comments