Kendall Reid Kendall Reid - 2 months ago 6
R Question

How to create new rows in a data frame based on missing data in R

I would like to add new rows to a dataframe that I am working with, based on data that is missing from the dataframe.

Here is an example dataframe.

year <- c(2001,2001,2002,2002,2003,2004,2004,2005)
make <- c('Honda', 'Ford', 'Honda', 'Ford', 'Honda', 'Honda', 'Ford', 'Honda')
number_manufactured <- c(10, 20, 15, 47, 14, 19, 35, 9)

cars <- data.frame(year, make, number_manufactured)

I would like to add a row to the data frame for values that are missing with number_manufactured = 0, such as:
(2003, Ford, 0) and (2005, Ford, 0)

My desired data frame would be this:

year <- c(2001,2001,2002,2002,2003,2003,2004,2004,2005,2005)
make <- c('Honda', 'Ford', 'Honda', 'Ford', 'Honda','Ford', 'Honda', 'Ford', 'Honda', 'Ford')
number_manufactured <- c(10, 20, 15, 47, 14, 0, 19, 35, 9, 0)

cars <- data.frame(year, make, number_manufactured)

Thanks for the help!

lmo lmo

Here is a base R method using expand.grid and merge.

# get new data.frame
dfNew <- merge(cars, expand.grid(unique(cars$year), unique(cars$make)), 
               by.x=c("year", "make"), by.y=c("Var1", "Var2"), all=TRUE)
# fill in 0s
dfNew$number_manufactured[$number_manufactured)] <- 0

expand.grid returns a data.frame with all combinations of two vectors. Here, it is fed the unique levels of year and make. this is merged onto the original data.frame to produce the new data.frame, with new observations included using the all=TRUE argument. The new observations are NA for number manufactured, so the second line converts these to 0s.