Gabriele Gabriele - 4 months ago 12
CSS Question

NoSuchElementException scraping ESPN with RSelenium

I'm using R (and RSelenium) to scrape data from ESPN. It's not the first time I use it, but in this case I'm getting an error and I can't sort this out.

Consider this page: http://en.espn.co.uk/premiership-2011-12/rugby/match/142562.html

Let's try to scrape the timeline. If I inspect the page I get the css selector

#liveLeft


As usual, I go with

checkForServer()
remDr <- remoteDriver()
remDr$open()

matchId <- "142562"
leagueString <- "premiership"
seasonString <- "2011-12"


url <- paste0("http://en.espn.co.uk/",leagueString,"-",seasonString,"/rugby/match/",matchId,".html")

remDr$navigate(url)


and the page correctly loads. So far so good. Now when I try to get the nodes with

div<- remDr$findElement(using = 'css selector','#liveLeft')


I get back

Error: Summary: NoSuchElement
Detail: An element could not be located on the page using the given search parameters.


I'm puzzled. I tried also with Xpath and doesn't work. I also tried to get different elements of the page with no luck. The only selector that gives something back is

#scrumContent

Answer

From the comments.

The element resides in an iframe and as such the element isnt available to select. This is shown when using js in the console in chrome with document.getElementById('liveLeft'). When on the full page it will return null, i.e. element doesn't exist, even though it is clearly visible. To get around this simply load the iframe instead.

If you inspect the page you will see the scr for the iframe is /premiership-2011-12/rugby/current/match/142562.html?view=scorecard, from the example provided. Navigating to this page instead of the 'full' page will allow the element to be 'visible' and as such selectable to RSelenium.

checkForServer()
remDr <- remoteDriver()
remDr$open()

matchId <- "142562"
leagueString <- "premiership"
seasonString <- "2011-12"

url <- paste0("http://en.espn.co.uk/",leagueString,"-",seasonString,"/rugby/current/match/",matchId,".html?view=scorecard")
# Amend url to return iframe

remDr$navigate(url)

div<- remDr$findElement(using = 'css selector','#liveLeft')

UPDATE

If it would be more applicable to load the iframe contents in a variable and then traverse through that then the following example shows this.

document.getElementById('liveLeft') # Will return null as iframe has seperate DOM

var doc = document.getElementById('win_old').contentDocument # Loads iframe DOM elements in the variable doc
doc.getElementById('liveLeft') # Will now return the desired element.