Tsuyoshi Endo - 1 year ago 129

R Question

When I realize countif and sumif by R,

I always use sapply-function and table-function like this:

`symbol = letters[sample(1:3, 5, replace=TRUE)]`

df=data.frame(a=symbol,

b=seq_len(length(symbol)))

#sumif

summary=data.frame(key=unique(df$a))

summary$sum=sapply(

seq_len(nrow(summary)),

function(i) with(df, sum(df$b[a==summary$key[i]]))

)

#countif

countif = data.frame(

key=names(table(df$a)),

count=as.vector(table(df$a))

)

summary = merge(

summary,

countif,

c("key")

)

Is there any efficient method?

Answer Source

We can use `data.table`

for efficiency. Convert the 'data.frame' to 'data.table' (`setDT(df)`

), grouped by 'a', we get the `sum`

of 'b' and the number of elements (`.N`

).

```
library(data.table)
setDT(df)[, .(sum = sum(b), count = .N), .(key = a)]
# key sum count
#1: c 1 1
#2: a 6 2
#3: b 8 2
```

Or another option is `dplyr`

```
library(dplyr)
df %>%
group_by(key = a) %>%
summarise(sum = sum(b), count = .N)
```