setbackademic setbackademic - 2 months ago 8
R Question

create a summary data frame based on counts

I am trying to use a data frame to create a second data frame of summary counts. My original data has the structure of:

mydata <- read.table(header=TRUE, text="
item type store1 store2 store3 store4 store5
chair timber 0 1 4 0 6
chair metal 0 1 4 1 9
chair upholstered 3 0 0 1 1
table indoor 1 8 0 1 0
table outdoor 1 12 2 1 0
bed single 0 0 2 1 0
bed double 0 1 1 1 0
bed queen 1 0 0 1 3
bed king 5 0 1 3 0")


I want my summary data frame to count every type of furniture present at each store and give me a summary of which stock is at each store (just presence/absence, not number of items). It should look like this:

summary <- read.table(header=TRUE, text="
store chair_types table_types bed_types total_types
store1 1 2 2 5
store2 2 2 1 5
store3 2 1 3 6
store4 2 2 4 8
store5 3 0 1 4")


This would be easy in excel but I'm trying to bite the bullet and learn to do it properly. Apologies if this is a duplicate, but I couldn't find a similar example. Thanks in advance.

Answer

We can do this with dplyr/tidyr. After grouping by 'item', loop over the 'store' columns (summarise_each), get the number of non-zero elements in each 'store' column (sum(.!=0), convert to 'long' format (gather), paste the substriing '_types' to 'item', spread the 'long' format to 'wide', and create a 'total' column using rowSums

library(dplyr)
library(tidyr)
mydata %>% 
     group_by(item) %>%
     summarise_each(funs(sum(.!=0)), store1:store5) %>% 
     gather(store, val, store1:store5) %>% 
     mutate(item = paste0(item, "_types")) %>%
     spread(item, val) %>%
     mutate(total = rowSums(.[-1]))
#   store bed_types chair_types table_types total
#   <chr>     <int>       <int>       <int> <dbl>
#1 store1         2           1           2     5
#2 store2         1           2           2     5
#3 store3         3           2           1     6
#4 store4         4           2           2     8
#5 store5         1           3           0     4

This can also be done by first converting to 'long' format, grouped by 'item', 'store', get the number of non-zero elements (summarise), grouped by 'store', create the 'Total' column by adding up the 'val' and then spread

mydata %>% 
     gather(store, val, store1:store5) %>%
     group_by(item, store) %>% 
     summarise(val = sum(val!=0)) %>% 
     group_by(store) %>% 
     mutate(Total = sum(val)) %>% 
     spread(item, val)