Fylux Fylux - 26 days ago 7
Javascript Question

How to avoid downloading the entire PDF to display

in my webpage you can read book in pdf format. The problem is that some books have around 1000 pages and the PDF is really big so even if the user reads just 10 pages the server download the full pdf, so this is awful for my hosting account because I have a transfer limit.

What could I do to display the pdf without load the full PDF.

I use pdf.js

Greetings.

Answer

ORIGINAL POST:

PDF files are designed in a way that forces the client side to download the whole file just to get the first page.

The last line of the PDF file tells the PDF reader where the root dictionary for the PDF file is located (the root dictionary tells the reader about the page catalog - order of pages - and other data used by the reader).

So, as you can see, the limitations of the PDF design require that you use a server side solution that will create a new PDF with only the page(s) you want to display.

The best solution (in my opinion) is to create a "reader" page (as opposed to a download page) that requests a specific page from the server and allows the user to advance page by page (using AJAX).

The server will need to create a new PDF (file or stream) that contains only the requested page and return it to the reader.

if you are running your server with Ruby (ruby on rails), you can use the combine_pdf gem to load the pdf and send just one page...

You can define a controller method that will look something like this:

def get_page
    # read the book
    book = CombinePDF.parse IO.read("book.pdf")
    # create empty PDF
    pdf_with_one_page = CombinePDF.new
    # add the page you want
    # notice that the pages array is indexed from 0,
    # so an adjustment to user input is needed...
    pdf_with_one_page << book.pages[ params[:page_number] - 1  ]
    # no need to create a file, just stream the data to the client.
    send_data pdf_with_one_page.to_pdf, type: 'application/pdf', disposition: 'inline'
end

if you are running PHP or node.js, you will need to find a different server-side solution.

Good luck!

EDIT:

I was looking over the PDF.js project (which looks very nice) and notice the limited support statement for Safari: "Safari (desktop and mobile) lacks a number of features or has defects, e.g. in typed arrays or HTTP range requests"...

I understand from this statement that on some browsers you can manage a client-side solution based on the HTTP Byte Serving protocol.

This will NOT work with all browsers, but it will keep you from having to use a server-side solution.

I couldn't find the documentation for the PDF.js feature (maybe it defaults to ranges and you just need to set the range...?), but I would go with a server-side solution that I know to work on all browsers.

EDIT 2:

Ignore Edit 1, as iPDFdev pointed out (thank you iPDFdev), this requires a special layout of the PDF file and will not resolve the issue of the browser downloading the whole file.

Comments