beddotcom beddotcom - 18 days ago 5
R Question

Insert missing rows by factor level

I'm sure there's a simple solution to this problem, but I'm having trouble figuring it out. I have a data frame in the following format:

Number Category Type Count
1 X A 10
2 X B 14
3 Y B 3
4 Z A 14


"Type" is a factor with two levels, {A,B}, and each level gets at least one "Category" entry, (for simplicity, they are denoted XYZ here, but in my actual dataset there are too many to list). I would like to the number of rows each Type has to match by Category:

Number Category Type Count
1 X A 10
2 X B 14
3 Y A <NA>
4 Y B 3
5 Z A 14
6 Z B <NA>


For instance, if Type A is listed in four rows of Category A, but Type B has no Category A listings, then four new rows of Category A, Type B should be created (with Count=NA). Similarly, if Type A gets four rows of Category A and Type B has two, then two new rows should be created.

I was able to find numerous answers on how to do this for missing dates in time series data using
seq()
,
expand.grid()
, and
merge()
, but I can't quite see how to do it in this case. I hope this is clear... Grateful for any help!

dat <- read.table(header = TRUE, text =
"Number Category Type Count
1 X A 10
2 X B 14
3 Y B 3
4 Z A 14")

Answer

Use expand.grid to make a master list and then merge:

alllevs <- do.call(expand.grid, lapply(dat[c("Type","Category")], levels))
merge(dat, alllevs, all.y=TRUE)

#  Category Type Number Count
#1        X    A      1    10
#2        X    B      2    14
#3        Y    A     NA    NA
#4        Y    B      3     3
#5        Z    A      4    14
#6        Z    B     NA    NA