Karthik Arumugham Karthik Arumugham - 1 month ago 29
R Question

PhantomJS not loading webpages properly

The below R code to get website screenshots using RSelenium (1.4.5 from CRAN) and PhantomJs (1.9.8) works fine in Mac OS and Ubuntu 14 on AWS. But the webpage doesn't seem to load properly when run on Digital Ocean Ubuntu 14 droplet. Check the screenshots from DigitalOcean and AWS for this webpage url - fantasy.premierleague.com/a/leagues/standings/1/classic. I used RStudio to view/save the images.

library(RSelenium)
pJS <- phantom()
browser = remoteDriver(browserName = "phantomJS")
browser$open()
url <- 'https://fantasy.premierleague.com/a/leagues/standings/1/classic'
browser$navigate(url)
browser$screenshot(display = TRUE)
pJS$stop()


I installed PhantomJS and RSelenium as below on Ubuntu:

#Install/Update system software
sudo apt-get update
sudo apt-get install build-essential chrpath libssl-dev libxft-dev
#Install packages dependencies
sudo apt-get install libfreetype6 libfreetype6-dev
sudo apt-get install libfontconfig1 libfontconfig1-dev
#Download PhantomJS
cd ~
export PHANTOM_JS="phantomjs-2.1.1-linux-x86_64"
wget https://bitbucket.org/ariya/phantomjs/downloads/$PHANTOM_JS.tar.bz2
sudo tar xvjf $PHANTOM_JS.tar.bz2
#Move PhantomJS to /usr/local/share/ and create a symlink:
sudo mv $PHANTOM_JS /usr/local/share
sudo ln -sf /usr/local/share/$PHANTOM_JS/bin/phantomjs /usr/local/bin
#Install pre-requisites for RSelenium
sudo apt-get install r-cran-xml
sudo apt-get install r-cran-RCurl


What is possibly the issue here? Is there a workaround for this issue on DigitalOcean?

Answer

I see the same problem on ubuntu 16.04 but not on windows.

I also see the following output on ubuntu 16.04:

> browser$navigate(url)
[ERROR - 2016-10-31T13:04:20.726Z] Session [89254650-9f6a-11e6-b112-2f925adde426] - page.onError - msg: ReferenceError: Can't find variable: $

  phantomjs://platform/console++.js:263 in error
[ERROR - 2016-10-31T13:04:20.726Z] Session [89254650-9f6a-11e6-b112-2f925adde426] - page.onError - stack:
  global code (https://fantasy.premierleague.com/a/leagues/standings/1/classic:5988)

  phantomjs://platform/console++.js:263 in error

Googling the errors it seems to be a current issue with phantomjs/ghostdriver. There is a new release of phantomjs/ghostdriver which will hopefully land soon. I found ignoring SSL errors worked for me:

library(RSelenium)
pJS <- phantom(extras = "--ignore-ssl-errors=true")
browser = remoteDriver(browserName = "phantom")
browser$open()
url <- 'https://fantasy.premierleague.com/a/leagues/standings/1/classic'
browser$navigate(url)
browser$screenshot(display = TRUE)
pJS$stop()
Comments