Florian Bidabe Florian Bidabe - 5 months ago 14
JSON Question

cURL extracts the wrong hyperlink from a webpage in bash

I am trying to use cURL to extract an hyperlink from Adobe:



When using cURL command line, the link that I get is a default one "http://www.adobe.com" instead of the one above.
I suspect that cURL is not "calling" the JavaScript or JQuery that populate the button with the right hyperlink.

enter image description here

Can anyone please point me to the right direction ?
How can I get cURL to generate or extract the right link for this button ?

Answer

You can use phantomjs.

Create a script like this

#! /usr/bin/phantomjs --ssl-protocol=any
var page = require('webpage').create(),
  system = require('system'),
  t, address;

if (system.args.length === 1) {
  console.log('Usage: load.js <some URL>');
  phantom.exit();
}

page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36';
address = system.args[1];
page.open(address, function(status) {
  if (status !== 'success') {
    console.log('FAIL to load the address: ' + status);
  } else {
    var btn = page.content.match(/<a id="buttonDownload" .*download-button">/)
    console.log(btn);
  }
  phantom.exit();
});

and invoke it like (if your OS supports shebang)

$ ./load.js https://get.adobe.com/air

obtaining

<a id="buttonDownload" href="/air/download/?installer=Adobe_AIR_22.0_for_Win32&amp;standalone=1" class="Button ButtonYellow download-button">

Otherwise, use it as

phantomjs --ssl-protocol=any load.js https://get.adobe.com/air