MLyall MLyall - 29 days ago 7
R Question

calculate number of rows in a dataframe above threshold as a function or other column factors

I would like to get find the number for rows for each subject for each day where the value is greater than 11 and output these in a data frame for analysis. The data set is large (5000 rows) so need a function for this.

subject = c(rep("A", 12), rep("B", 12))
day = c(1,1,1,1,2,2,2,2,3,3,3,3,1,1,1,1,2,2,2,2,3,3,3,3)
value = c(13,14,15,5,12,9,6,14,19,2,13,13,13,14,15,5,12,9,6,14,19,2,13,13)
df = data.frame(subject, day, value)
df

subject day value
1 A 1 13
2 A 1 14
3 A 1 15
4 A 1 5
5 A 2 12
6 A 2 9
7 A 2 6
8 A 2 14
9 A 3 19
10 A 3 2
11 A 3 13
12 A 3 13
13 B 1 13
14 B 1 14
15 B 1 15
16 B 1 5
17 B 2 12
18 B 2 9
19 B 2 6
20 B 2 14
21 B 3 19
22 B 3 2
23 B 3 13
24 B 3 13


The output I would like would be

subject.agg = c(rep("A", 3), rep("B", 3))
day.agg = as.factor(c(1,2,3,1,2,3))
highvalues = (c(3,2,3,3,2,3))
df.agg = data.frame(subject.agg,day.agg,highvalues)
df.agg

subject.agg day.agg highvalues
1 A 1 3
2 A 2 2
3 A 3 3
4 B 1 3
5 B 2 2
6 B 3 3


Any help much would be very much appreciated.

Answer

One option is aggregate from base R

aggregate(cbind(highvalues=value>11)~., df,  sum)

Or with data.table

library(data.table)
setDT(df)[value>11, .(highvalues=.N), by = .(subject, day)]
#     subject day highvalues
#1:       A   1          3
#2:       A   2          2
#3:       A   3          3
#4:       B   1          3
#5:       B   2          2
#6:       B   3          3