I'm trying to build a dataframe with 2 data I've scraped on IMDB: the first one has 50 values and the second one has only 29. Is there an easy way to ask R to automatically fill with NA the other 21 values that he didn't find?
My code:
imdb <- read_html("http://www.imdb.com/search/title?genres=horror&genres=mystery&sort=moviemeter,asc&view=advanced")
title <- html_nodes(imdb, '.lister-item-header a')
title <- html_text(title)
metascore <- html_nodes(imdb, '.ratings-metascore')
metascore <- html_text(metascore)
df <- data.frame(Title = title, Metascore = metascore)
Error in data.frame(Title = title, Metascore = metascore) :
arguments imply differing number of rows: 50, 29
You need to change your fourth line. You want metascore
to have as many elements as title
, with NA
for those title
s that don't have a metascore
listed. The way to do this is to extract the item-content
nodes, and then, from each of these, to select the ratings-metascore
node if it exists, or NA
if it doesn't. See ?html_nodes
for the difference between html_node
and html_nodes
. I've also added span
to ensure that just the number is returned, without the following word 'metascore'.
imdb <- read_html("http://www.imdb.com/search/title?genres=horror&genres=mystery&sort=moviemeter,asc&view=advanced")
title <- html_nodes(imdb, '.lister-item-header a')
title <- html_text(title)
metascore <- html_node(html_nodes(imdb, '.lister-item-content'), '.ratings-metascore span')
metascore <- html_text(metascore)
df <- data.frame(Title = title, Metascore = metascore)
head(df,10)
Title Metascore
1 Mother! <NA>
2 Annabelle: Creation 62
3 Stranger Things <NA>
4 Supernatural <NA>
5 It <NA>
6 The Vampire Diaries <NA>
7 Get Out 84
8 The Originals <NA>
9 Annabelle 37
10 Grimm <NA>