shoorideh shoorideh - 3 months ago 23
R Question

apply function for each subgroup

I was wondering how I can use loop function to calculate

apply(table(data$people,data$event),2,function(x) mean(x[x>0]))


for each level of Colour. I mean, I want to calculate the above function for each level of Colour.

people <-c("R1","R2","R2","R3","R3","R4","R4","R4","R4","R3","R3","R3","R3","R2","R2","R2","R5","R6")
event<-c("a","b","b","M","s","f","y","b","a","a","a","a","s","c","c","b","m","a")
Colour<-c("red","blue","green","pink","red","blue","grean","red","red","black","pink","blue","blue","green","blue","green","green","red")

data<-data.frame(people,event,Colour)

Answer

To do your function to each group, let's first make it a function:

your_function = function(data) {
    apply(table(data$people,data$event),2,function(x) mean(x[x>0]))
}

Then we can split your data up by Colour and apply your function to each sub-data-frame:

dat_split = split(data, f = data$Colour)
results = lapply(dat_split, your_function)

results
# $black
#   a   b   c   f   m   M   s   y 
#   1 NaN NaN NaN NaN NaN NaN NaN 
#
# $blue
#   a   b   c   f   m   M   s   y 
#   1   1   1   1 NaN NaN   1 NaN 
#
# $grean
#   a   b   c   f   m   M   s   y 
# NaN NaN NaN NaN NaN NaN NaN   1 
# ...

Personally, I don't find this very friendly. data.table and dplyr make doing things to subsets of data frames easy. I would have used dplyr from the start, like this:

library(dplyr)
data %>% group_by(people, Colour, event) %>%
    summarize(n = n()) %>%
    group_by(Colour, event) %>%
    summarize(mean = mean(n)) %>%
    tidyr::spread(key = event, value = mean)

# Source: local data frame [6 x 9]
#
#   Colour     a     b     c     f     m     M     s     y
#   (fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
# 1  black     1    NA    NA    NA    NA    NA    NA    NA
# 2   blue     1     1     1     1    NA    NA     1    NA
# 3  grean    NA    NA    NA    NA    NA    NA    NA     1
# ...