tohster tohster - 8 days ago 11
Python Question

Setting timeout on selenium webdriver.PhantomJS

The situation

I have a simple python script to get the HTML source for a given url:

browser = webdriver.PhantomJS()
browser.get(url)
content = browser.page_source


Occasionally, the url points to a page with slow-loading external resources (e.g. video files, or really slow advertising content).

Webdriver will wait until those resources are loaded before completing the
.get(url)
request.

Note: For extraneous reasons, I need to do this with PhantomJS rather than
requests
or
urllib2





The question

I'd like to set a timeout on PhantomJS resource loading so that if the resource is taking too long to load, the browser just assumes it doesn't exist or whatever.

This would allow me to perform the subsequent
.pagesource
query based on what the browser has loaded.

Documentation on webdriver.PhantomJS is very thin, and I haven't found a similar question on SO.

thanks in advance!

Answer

PhantomJS has provided resourceTimeout, which might suit your needs. I quote from documentation here

(in milli-secs) defines the timeout after which any resource requested will stop trying and proceed with other parts of the page. onResourceTimeout callback will be called on timeout.

So in Ruby, you can do something like

require 'selenium-webdriver'

capabilities = Selenium::WebDriver::Remote::Capabilities.phantomjs("phantomjs.page.settings.resourceTimeout" => "5000")
driver = Selenium::WebDriver.for :phantomjs, :desired_capabilities => capabilities

I believe in Python, it's something like (untested, only provides the logic, you are the Python developer, hopefully you will figure out)

driver = webdriver.PhantomJS(desired_capabilities={'phantomjs.page.settings.resourceTimeout': '5000'})
Comments