Kendall Reid - 3 months ago 13

R Question

I would like to add new rows to a dataframe that I am working with, based on data that is missing from the dataframe.

Here is an example dataframe.

`year <- c(2001,2001,2002,2002,2003,2004,2004,2005)`

make <- c('Honda', 'Ford', 'Honda', 'Ford', 'Honda', 'Honda', 'Ford', 'Honda')

number_manufactured <- c(10, 20, 15, 47, 14, 19, 35, 9)

cars <- data.frame(year, make, number_manufactured)

I would like to add a row to the data frame for values that are missing with number_manufactured = 0, such as:

(2003, Ford, 0) and (2005, Ford, 0)

My desired data frame would be this:

`year <- c(2001,2001,2002,2002,2003,2003,2004,2004,2005,2005)`

make <- c('Honda', 'Ford', 'Honda', 'Ford', 'Honda','Ford', 'Honda', 'Ford', 'Honda', 'Ford')

number_manufactured <- c(10, 20, 15, 47, 14, 0, 19, 35, 9, 0)

cars <- data.frame(year, make, number_manufactured)

Thanks for the help!

Answer

Here is a base R method using `expand.grid`

and `merge`

.

```
# get new data.frame
dfNew <- merge(cars, expand.grid(unique(cars$year), unique(cars$make)),
by.x=c("year", "make"), by.y=c("Var1", "Var2"), all=TRUE)
# fill in 0s
dfNew$number_manufactured[is.na(dfNew$number_manufactured)] <- 0
```

`expand.grid`

returns a data.frame with all combinations of two vectors. Here, it is fed the unique levels of year and make. this is merged onto the original data.frame to produce the new data.frame, with new observations included using the all=TRUE argument. The new observations are NA for number manufactured, so the second line converts these to 0s.