beddotcom - 1 year ago 77
R Question

# Insert missing rows by factor level

I'm sure there's a simple solution to this problem, but I'm having trouble figuring it out. I have a data frame in the following format:

``````Number Category Type   Count
1      X        A      10
2      X        B      14
3      Y        B      3
4      Z        A      14
``````

"Type" is a factor with two levels, {A,B}, and each level gets at least one "Category" entry, (for simplicity, they are denoted XYZ here, but in my actual dataset there are too many to list). I would like to the number of rows each Type has to match by Category:

``````Number Category Type   Count
1      X        A      10
2      X        B      14
3      Y        A      <NA>
4      Y        B      3
5      Z        A      14
6      Z        B      <NA>
``````

For instance, if Type A is listed in four rows of Category A, but Type B has no Category A listings, then four new rows of Category A, Type B should be created (with Count=NA). Similarly, if Type A gets four rows of Category A and Type B has two, then two new rows should be created.

I was able to find numerous answers on how to do this for missing dates in time series data using
`seq()`
,
`expand.grid()`
, and
`merge()`
, but I can't quite see how to do it in this case. I hope this is clear... Grateful for any help!

``````dat <- read.table(header = TRUE, text =
"Number Category Type   Count
1      X        A      10
2      X        B      14
3      Y        B      3
4      Z        A      14")
``````

Use `expand.grid` to make a master list and then `merge`:

``````alllevs <- do.call(expand.grid, lapply(dat[c("Type","Category")], levels))
merge(dat, alllevs, all.y=TRUE)

#  Category Type Number Count
#1        X    A      1    10
#2        X    B      2    14
#3        Y    A     NA    NA
#4        Y    B      3     3
#5        Z    A      4    14
#6        Z    B     NA    NA
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download