Marina Alves Marina Alves - 9 months ago 49
R Question

Scraping with rvest: how to fill blank numbers in a row to transform in a data frame?

I'm trying to build a dataframe with 2 data I've scraped on IMDB: the first one has 50 values and the second one has only 29. Is there an easy way to ask R to automatically fill with NA the other 21 values that he didn't find?

My code:

imdb <- read_html("http://www.imdb.com/search/title?genres=horror&genres=mystery&sort=moviemeter,asc&view=advanced")
title <- html_nodes(imdb, '.lister-item-header a')
title <- html_text(title)
metascore <- html_nodes(imdb, '.ratings-metascore')
metascore <- html_text(metascore)
df <- data.frame(Title = title, Metascore = metascore)
Error in data.frame(Title = title, Metascore = metascore) :
arguments imply differing number of rows: 50, 29


Thank you!

Answer Source

You need to change your fourth line. You want metascore to have as many elements as title, with NA for those titles that don't have a metascore listed. The way to do this is to extract the item-content nodes, and then, from each of these, to select the ratings-metascore node if it exists, or NA if it doesn't. See ?html_nodes for the difference between html_node and html_nodes. I've also added span to ensure that just the number is returned, without the following word 'metascore'.

imdb <- read_html("http://www.imdb.com/search/title?genres=horror&genres=mystery&sort=moviemeter,asc&view=advanced")
title <- html_nodes(imdb, '.lister-item-header a')
title <- html_text(title)
metascore <- html_node(html_nodes(imdb, '.lister-item-content'), '.ratings-metascore span')
metascore <- html_text(metascore)
df <- data.frame(Title = title, Metascore = metascore)

head(df,10)
                 Title  Metascore
1              Mother!       <NA>
2  Annabelle: Creation 62        
3      Stranger Things       <NA>
4         Supernatural       <NA>
5                   It       <NA>
6  The Vampire Diaries       <NA>
7              Get Out 84        
8        The Originals       <NA>
9            Annabelle 37        
10               Grimm       <NA>
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download