Tsuyoshi Endo Tsuyoshi Endo - 3 months ago 56
R Question

efficient method how to realize sumif and countif in R

When I realize countif and sumif by R,
I always use sapply-function and table-function like this:

symbol = letters[sample(1:3, 5, replace=TRUE)]
df=data.frame(a=symbol,
b=seq_len(length(symbol)))


#sumif
summary=data.frame(key=unique(df$a))
summary$sum=sapply(
seq_len(nrow(summary)),
function(i) with(df, sum(df$b[a==summary$key[i]]))
)

#countif
countif = data.frame(
key=names(table(df$a)),
count=as.vector(table(df$a))
)

summary = merge(
summary,
countif,
c("key")
)


Is there any efficient method?

Answer

We can use data.table for efficiency. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'a', we get the sum of 'b' and the number of elements (.N).

library(data.table)
setDT(df)[, .(sum = sum(b), count = .N), .(key = a)]
#    key sum count
#1:   c   1     1
#2:   a   6     2
#3:   b   8     2

Or another option is dplyr

library(dplyr)
df %>%
   group_by(key = a) %>%
   summarise(sum = sum(b), count = .N)
Comments