CiscoKidx CiscoKidx - 3 months ago 68
Javascript Question

Scrape html from js with node.js and horseman

I am trying to scrape the array of batters with salary information from this page:
https://www.swishanalytics.com/optimus/mlb/dfs-batter-projections

I am using node.js and node-horseman.

Here is my code:

var Horseman = require('node-horseman');
var horseman = new Horseman();

horseman.open('https://www.swishanalytics.com/optimus/mlb/dfs-batter-projections');

if (horseman.status() === 200) {
console.log('[+] Successful page opening')
horseman.screenshot('image.png');
console.log(horseman.html());
}
horseman.close();


The issue is the return from horseman.html() is still a lot of JavaScript and cannot be extracted with something like cheerio. How can I execute the javascript programatically?

For example, if I view source at the same link I see that the area that has the batters starts with

function Model(){ this.batterArray =
[{"team_short":"rockies","mlbam_id":"571448","player_name":"Nolan Arenado",


Obviously this is still javascript... I'm assuming that at some point it must be executed and converted to HTML to be presented by a browser?

Answer

I just tested this out and it seems to work:

var Horseman = require('node-horseman');
var horseman = new Horseman();

horseman.open('https://www.swishanalytics.com/optimus/mlb/dfs-batter-projections');

if (horseman.status() === 200) {
    console.log('[+] Successful page opening')
    horseman.screenshot('image.png');
    var batters = horseman.evaluate(function(){
        return (new Model()).batterArray;
    });
    console.log(batters);
  }    
horseman.close();

That will give you an array of batters that you can use in your code. You could write it out to a file or create a table out of it.