beddotcom - 7 months ago 46

R Question

I'm sure there's a simple solution to this problem, but I'm having trouble figuring it out. I have a data frame in the following format:

`Number Category Type Count`

1 X A 10

2 X B 14

3 Y B 3

4 Z A 14

"Type" is a factor with two levels, {A,B}, and each level gets

`Number Category Type Count`

1 X A 10

2 X B 14

3 Y A <NA>

4 Y B 3

5 Z A 14

6 Z B <NA>

For instance, if Type A is listed in four rows of Category A, but Type B has no Category A listings, then four new rows of Category A, Type B should be created (with Count=NA). Similarly, if Type A gets four rows of Category A and Type B has two, then two new rows should be created.

I was able to find numerous answers on how to do this for missing dates in time series data using

`seq()`

`expand.grid()`

`merge()`

`dat <- read.table(header = TRUE, text =`

"Number Category Type Count

1 X A 10

2 X B 14

3 Y B 3

4 Z A 14")

Answer

Use `expand.grid`

to make a master list and then `merge`

:

```
alllevs <- do.call(expand.grid, lapply(dat[c("Type","Category")], levels))
merge(dat, alllevs, all.y=TRUE)
# Category Type Number Count
#1 X A 1 10
#2 X B 2 14
#3 Y A NA NA
#4 Y B 3 3
#5 Z A 4 14
#6 Z B NA NA
```

Source (Stackoverflow)