MLyall - 9 months ago 40

R Question

I would like to get find the number for rows for each subject for each day where the value is greater than 11 and output these in a data frame for analysis. The data set is large (5000 rows) so need a function for this.

`subject = c(rep("A", 12), rep("B", 12))`

day = c(1,1,1,1,2,2,2,2,3,3,3,3,1,1,1,1,2,2,2,2,3,3,3,3)

value = c(13,14,15,5,12,9,6,14,19,2,13,13,13,14,15,5,12,9,6,14,19,2,13,13)

df = data.frame(subject, day, value)

df

subject day value

1 A 1 13

2 A 1 14

3 A 1 15

4 A 1 5

5 A 2 12

6 A 2 9

7 A 2 6

8 A 2 14

9 A 3 19

10 A 3 2

11 A 3 13

12 A 3 13

13 B 1 13

14 B 1 14

15 B 1 15

16 B 1 5

17 B 2 12

18 B 2 9

19 B 2 6

20 B 2 14

21 B 3 19

22 B 3 2

23 B 3 13

24 B 3 13

The output I would like would be

`subject.agg = c(rep("A", 3), rep("B", 3))`

day.agg = as.factor(c(1,2,3,1,2,3))

highvalues = (c(3,2,3,3,2,3))

df.agg = data.frame(subject.agg,day.agg,highvalues)

df.agg

subject.agg day.agg highvalues

1 A 1 3

2 A 2 2

3 A 3 3

4 B 1 3

5 B 2 2

6 B 3 3

Any help much would be very much appreciated.

Answer Source

One option is `aggregate`

from `base R`

```
aggregate(cbind(highvalues=value>11)~., df, sum)
```

Or with `data.table`

```
library(data.table)
setDT(df)[value>11, .(highvalues=.N), by = .(subject, day)]
# subject day highvalues
#1: A 1 3
#2: A 2 2
#3: A 3 3
#4: B 1 3
#5: B 2 2
#6: B 3 3
```