Cyrus Mohammadian Cyrus Mohammadian - 3 months ago 59
R Question

web scraping using r's rvest package and RSelenium

I usually have no problem scraping html tables using the

read_html
command from
rvest
however, I'm having some trouble with a particular website. Any help would be much appreciated. Here's my workflow:

#Dependencies
library(rvest)
library(pipeR)

#Scrape table from site
url2 <- "http://priceonomics.com/hotels/rankings/#airbnb-apartments-all"
data2 <- url2 %>%
read_html() %>%
html_nodes(xpath='//*[@id="airbnb-apartments-all"]/table') %>%
html_table(fill = TRUE)
data2<-data2[[1]]


What I end up with is a table with the correct column headings but no data! I would like to scrape the 2nd table on that site. Thanks in advance!

data2
[1] Rank City $
<0 rows> (or 0-length row.names)


I used google chrome to identify the xpath. I've also tried the following:

readHTMLTable(url2)


Which produces:

$`NULL`
NULL

$`NULL`
NULL

$`NULL`
NULL


Finally, in case the website is using Java, I tried using R's
RSelenium
package, but I can't seem to connect to the server properly:

library(RSelenium)
checkForServer()
startServer()
remDr <- remoteDriver(browserName="firefox", port=4444)
remDr$open(silent=T)
Undefined error in RCurl call.Error in queryRD(paste0(serverURL, "/session"), "POST", qdata = toJSON(serverOpts)) :

Answer

All the data is returned in a JSON file. You can probably construct the tables with it. For example the first table:

library(jsonlite)
library(data.table)
appData <- fromJSON("http://priceonomics.com/static/js/hotels/all_data.json")
# replicate table
myDf <- data.frame(City = names(appData), Price = sapply(appData, function(x) x$air$apt$p)
                   , stringsAsFactors = FALSE)
setDT(myDf)
> myDf[order(Price, decreasing = TRUE)][1:10]
City Price
1:        Boston, MA 185.0
2:      New York, NY 180.0
3: San Francisco, CA 165.0
4:     Cambridge, MA 155.0
5:    Scottsdale, AZ 142.5
6:     Charlotte, NC 139.5
7:    Charleston, SC 139.5
8:     Las Vegas, NV 135.0
9:         Miami, FL 135.0
10:       Chicago, IL 130.0