hitchhiker hitchhiker - 27 days ago 4
R Question

Apply function (stat test) to subsets of data for each factor level

I'm new to R. I've looked through many similar questions but not found anything that has helped me solve my problem.

Say I have a data frame dat created like so:

dat <- data.frame(v1=rep(c("a","a","b","b"),3), v2=c(rep("x",4),rep("y",4),rep("z",4)), dv=sample(1:100, 12), id=rep(c("p1","p2"),6))

...that looks like this:

v1 v2 dv id
1 a x 40 p1
2 a x 99 p2
3 b x 67 p1
4 b x 24 p2
5 a y 16 p1
6 a y 51 p2
7 b y 85 p1
8 b y 72 p2
9 a z 33 p1
10 a z 31 p2
11 b z 88 p1
12 b z 50 p2

I would like, for each condition/level of var2, to conduct a t test for difference between conditions a&b of var1.
I could do this by subsetting the data frame by level of var2 and then looping through applying the t test for diff between conditions a & b of var1, but as I understand it one of the strengths of R is avoiding loops (using apply and other related functions).

(Then I would of course correct for multiple comparisons)


One option that you have is the so-called apply-family.

First you split your data up into the different v1s, then you apply a function to all subsets.

Given that you want to conduct the t.test on the variable "dv" the approach would like this:

split_dat <- split(dat, dat$v2)

sapply(split_dat, function(sub_dat) {
  result <- t.test(sub_dat[sub_dat$v1 == "a", "dv"],
                   sub_dat[sub_dat$v1 == "b", "dv"])


# Result:
#         x         y         z 
# 0.1220663 0.6092622 0.8887763