Xavier Tibau Alberdi Xavier Tibau Alberdi - 1 month ago 20
R Question

Spread columns by count in R dplyr

I have a factor column. I would like to spread into one column for each factor and then fill the gaps by the count of that factor appears for each id. Suppose we have:

car <- c("a","b","b","b","c","c","a","b","b","b","c","c")
type <- c("good", "regular", "bad","good", "regular", "bad","good", "regular", "bad","good", "regular", "bad")
car_type <- data.frame(car,type)


and get:

car type
1 a good
2 b regular
3 b bad
4 b good
5 c regular
6 c bad
7 a good
8 b regular
9 b bad
10 b good
11 c regular
12 c bad


I want this:

> results
car good regular bad
1 a 2 0 0
2 b 2 2 2
3 c 0 2 2


I try this using dplyr, but I'm not really use to it, so It doesn't work.

car_type %>%
select(car, type) %>%
group_by(car) %>%
mutate(seq = unique(type)) %>%
spread(seq, type)


I would thanks any help.

Answer

With reshape2:

As @DavidArenburg pointed out, you don't need dplyr for this.

library(reshape2)

dcast(car_type, car ~ type)

If you were going to use dplyr, the code would be:

dplyr and reshape2

car_type %>% count(car, type) %>%
  dcast(car ~ type, fill=0)

dplyr and tidyr

car_type %>% count(car, type) %>%
  spread(type, n, fill=0)

In either case, count(car, type) is equivalent to

group_by(car, type) %>% tally

or

group_by(car, type) %>% summarise(n=n())

With data.table

library(data.table)

dcast(setDT(car_type), car ~ type, fill=0)