Chords Chords - 2 months ago 22
Bash Question

Generate PDF Behind Authentication Wall

I'm trying to generate a PDF using WKHTMLTOPDF that requires me to first log in. There's some on this on the internet already but I can't seem to get mine working. I'm in Terminal - nothing fancy.

I've tried (among a whole lot of other stuff):

/usr/bin/wkhtmltopdf --post username=myusername --post password=mypassword "URL to Generate" test.pdf

/usr/bin/wkhtmltopdf --username myusername --password mypassword "URL to Generate" test.pdf

/usr/bin/wkhtmltopdf --cookie-jar my.jar --post username=myusername --post password=mypassword "URL to Generate Cookie For"


username and password are both the
id
and the
name
of the input fields on the form. I am getting the my.jar file to show up, but nothing is written to it.

Specific questions:


  1. Should I be specifying the login page and/or form action anywhere?

  2. the --cookie-jar parameter has been mentioned in various places (both as being needed and otherwise). Should that be necessary, how does it work? I've created the my.jar file but how do I use it again? Referencing:



http://code.google.com/p/wkhtmltopdf/issues/detail?id=356




EDIT:

Surely someone has done this successfully? A good way to showcase an example might if someone is willing to get it to work on some popular website that requires login credentials to eliminate a potential variable.

Answer

I think the form I'm trying to log in to is too complex. It's secure, sets three cookies, redirects twice, and posts a number of other variables outside of the username and password, one of which requires a cookie value (I even tried concatenating the value into the post variable, but no luck). This is probably a pretty rare issue - by no means the fault of WKHTMLTOPDF.

I wound up using CURL to log in and write the page to a local file, then ran WKHTMLTOPDF against that. Definitely a solid work around for anyone else having a similar issue.


Edit: CURL, if interested:

curl_setopt($ch, CURLOPT_HEADER, 1); # Change to 1 to see WTF
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_URL, $loginUrl);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postFields);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);