DataProphets DataProphets - 2 months ago 26
R Question

Webscraping in R with XML Package and which Function - "Null" error

I'm trying to get the table "Pass Targets" from http://www.pro-football-reference.com/boxscores/201609150buf.htm into R. However, only the first two tables appear to be available.

> sample1 = readHTMLTable("http://www.pro-football-reference.com/boxscores/201609150buf.htm", which = 1)
> sample2 = readHTMLTable("http://www.pro-football-reference.com/boxscores/201609150buf.htm", which = 2)
> sample3 = readHTMLTable("http://www.pro-football-reference.com/boxscores/201609150buf.htm", which = 3)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’


By my count, there are 20 tables on this page, of which I'm after #14. The fact that the code works for some of the tables, and not others, leads me to think it is an issue with this specific URL.

There are shortcuts in the header on this page. So, I tried using that URL instead, but its fetches the same tables and returns the same results.

> sample1a = readHTMLTable("http://www.pro-football-reference.com/boxscores/201609150buf.htm", which = 1)
> sample1a[1:5, ]
  1
1 via Sports Logos.net\n\tAbout logos New York Jets 6
2 via Sports Logos.net\n\tAbout logos Buffalo Bills 7
NA <NA> <NA> <NA>
NA.1 <NA> <NA> <NA>
NA.2 <NA> <NA> <NA>
2 3 4 Final
1 14 7 10 37
2 3 14 7 31
NA <NA> <NA> <NA> <NA>
NA.1 <NA> <NA> <NA> <NA>
NA.2 <NA> <NA> <NA> <NA>
> sample1b = readHTMLTable("http://www.pro-football-reference.com/boxscores/201609150buf.htm#all_vis_snap_counts", which = 1)
> sample1b[1:5, ]
  1
1 via Sports Logos.net\n\tAbout logos New York Jets 6
2 via Sports Logos.net\n\tAbout logos Buffalo Bills 7
NA <NA> <NA> <NA>
NA.1 <NA> <NA> <NA>
NA.2 <NA> <NA> <NA>
2 3 4 Final
1 14 7 10 37
2 3 14 7 31
NA <NA> <NA> <NA> <NA>
NA.1 <NA> <NA> <NA> <NA>
NA.2 <NA> <NA> <NA> <NA>
> sample3 = readHTMLTable("http://www.pro-football-reference.com/boxscores/201609150buf.htm#all_vis_snap_counts", which = 3)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’


I've read many helpful threads related to this topic. However, I felt compelled to post because this problem seems to be unique to this webpage.

Your help is greatly appreciated!

Answer

The markup for the table you're looking for is commented out which is why you can't get it using readHTMLtable. Here's how to extract the comment, and then extract the table from the comment, using rvest.

library(rvest)
page <- read_html("http://www.pro-football-reference.com/boxscores/201609150buf.htm")

# first get all the comments in the page
comments <- page %>% 
  html_nodes(xpath = "//comment()") 

# select the comment you want, convert it to text
# convert text to html, extract table node, and
# finally, convert to table
pass_targets <- comments[52] %>%
  html_text() %>% 
  read_html() %>% 
  html_node("#targets_directions") %>% 
  html_table()
Comments