Florian Bidabe Florian Bidabe - 1 year ago 81
JSON Question

cURL extracts the wrong hyperlink from a webpage in bash

I am trying to use cURL to extract an hyperlink from Adobe:

When using cURL command line, the link that I get is a default one "http://www.adobe.com" instead of the one above.
I suspect that cURL is not "calling" the JavaScript or JQuery that populate the button with the right hyperlink.

enter image description here

Can anyone please point me to the right direction ?
How can I get cURL to generate or extract the right link for this button ?

Answer Source

You can use phantomjs.

Create a script like this

#! /usr/bin/phantomjs --ssl-protocol=any
var page = require('webpage').create(),
  system = require('system'),
  t, address;

if (system.args.length === 1) {
  console.log('Usage: load.js <some URL>');

page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36';
address = system.args[1];
page.open(address, function(status) {
  if (status !== 'success') {
    console.log('FAIL to load the address: ' + status);
  } else {
    var btn = page.content.match(/<a id="buttonDownload" .*download-button">/)

and invoke it like (if your OS supports shebang)

$ ./load.js https://get.adobe.com/air


<a id="buttonDownload" href="/air/download/?installer=Adobe_AIR_22.0_for_Win32&amp;standalone=1" class="Button ButtonYellow download-button">

Otherwise, use it as

phantomjs --ssl-protocol=any load.js https://get.adobe.com/air
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download