Mamba - 3 months ago 5
R Question

# How to iterate over groups and combinations of factors to t-test the differences in means?

I have the following data struture,

``````date <- as.Date(as.character( c("2015-02-13",
"2015-02-13",
"2015-02-13",
"2015-02-13",
"2015-02-13",
"2015-02-13",
"2015-02-13",
"2015-02-13",
"2015-02-13",
"2015-02-14",
"2015-02-14",
"2015-02-14",
"2015-02-14",
"2015-02-14",
"2015-02-14",
"2015-02-14",
"2015-02-14",
"2015-02-14",
"2015-02-15",
"2015-02-15",
"2015-02-15",
"2015-02-15",
"2015-02-15",
"2015-02-15",
"2015-02-15",
"2015-02-15",
"2015-02-15")))

name <- c("John","Michael","Thomas",
"John","Michael","Thomas",
"John","Michael","Thomas",
"John","Michael","Thomas",
"John","Michael","Thomas",
"John","Michael","Thomas",
"John","Michael","Thomas",
"John","Michael","Thomas",
"John","Michael","Thomas")

drinks <-c("Beer","Coffee","Tee",
"Tee","Beer", "Coffee",
"Coffee","Tee","Beer",
"Beer","Coffee","Tee",
"Tee","Beer", "Coffee",
"Coffee","Tee","Beer",
"Beer","Coffee","Tee",
"Tee","Beer", "Coffee",
"Coffee","Tee","Beer")

consumed <- c(3,2,5,3,6,2,9,4,5,
1,3,5,8,0,1,2,3,5,
1,24,4,5,7,9,9,1,2)

version_1 <- data.frame(date,name,drinks,consumed)
``````

My second dataframe is almost idetical except for consumtion:

``````consumed <- c(10,9,1,20,30,1,50,40,20,
10,2,10,2,1,1,2,3,5,
20,24,1,40,2,8,4,0,0)

version_2 <- data.frame(date,name,drinks,consumed)

version_1\$version <- rep("one", nrow(version_1))
version_2\$version <- rep("two", nrow(version_2))
all <- rbind(version_1, version_2)

all\$version <- as.factor(all\$version)

date    name drinks consumed version
1  2015-02-13    John   Beer        3     one
2  2015-02-13 Michael Coffee        2     one
3  2015-02-13  Thomas    Tee        5     one
4  2015-02-13    John    Tee        3     one
5  2015-02-13 Michael   Beer        6     one
6  2015-02-13  Thomas Coffee        2     one
7  2015-02-13    John Coffee        9     one
8  2015-02-13 Michael    Tee        4     one
9  2015-02-13  Thomas   Beer        5     one
10 2015-02-14    John   Beer        1     one
11 2015-02-14 Michael Coffee        3     one
12 2015-02-14  Thomas    Tee        5     one
13 2015-02-14    John    Tee        8     one
14 2015-02-14 Michael   Beer        0     one
15 2015-02-14  Thomas Coffee        1     one
16 2015-02-14    John Coffee        2     one
17 2015-02-14 Michael    Tee        3     one
18 2015-02-14  Thomas   Beer        5     one
19 2015-02-15    John   Beer        1     one
20 2015-02-15 Michael Coffee       24     one
21 2015-02-15  Thomas    Tee        4     one
22 2015-02-15    John    Tee        5     one
23 2015-02-15 Michael   Beer        7     one
24 2015-02-15  Thomas Coffee        9     one
25 2015-02-15    John Coffee        9     one
26 2015-02-15 Michael    Tee        1     one
27 2015-02-15  Thomas   Beer        2     one
28 2015-02-13    John   Beer       10     two
29 2015-02-13 Michael Coffee        9     two
30 2015-02-13  Thomas    Tee        1     two
31 2015-02-13    John    Tee       20     two
32 2015-02-13 Michael   Beer       30     two
33 2015-02-13  Thomas Coffee        1     two
34 2015-02-13    John Coffee       50     two
35 2015-02-13 Michael    Tee       40     two
36 2015-02-13  Thomas   Beer       20     two
37 2015-02-14    John   Beer       10     two
38 2015-02-14 Michael Coffee        2     two
39 2015-02-14  Thomas    Tee       10     two
40 2015-02-14    John    Tee        2     two
41 2015-02-14 Michael   Beer        1     two
42 2015-02-14  Thomas Coffee        1     two
43 2015-02-14    John Coffee        2     two
44 2015-02-14 Michael    Tee        3     two
45 2015-02-14  Thomas   Beer        5     two
46 2015-02-15    John   Beer       20     two
47 2015-02-15 Michael Coffee       24     two
48 2015-02-15  Thomas    Tee        1     two
49 2015-02-15    John    Tee       40     two
50 2015-02-15 Michael   Beer        2     two
51 2015-02-15  Thomas Coffee        8     two
52 2015-02-15    John Coffee        4     two
53 2015-02-15 Michael    Tee        0     two
54 2015-02-15  Thomas   Beer        0     two
``````

I would like to loop over the dataframe and t-test the group differences(one vs. two) differences. Each day has always one unique combination of names and drinks consumed. Thus I would like to test:

2015-02-13 John Beer 3 one
2015-02-14 John Beer 1 one
2015-02-15 John Beer 1 one

versus

2015-02-13 John Beer 10 two
2015-02-14 John Beer 10 two
2015-02-15 John Beer 20 two

and so on for each date, name and drink group pair.

I just cant figure out how to achieve that:

``````for (i in 1:length(date)){
temp <- all[all\$date==date[i],]

}
``````

Answer

Using `data.table`:

``````library(data.table)
setDT(all)

all[, t.test(consumed[version == "one"], consumed[version == "two"]), by = .(name,drinks)]
name drinks  statistic parameter    p.value   conf.int  estimate null.value alternative                  method                                                 data.name
1:    John   Beer -3.4320324  2.159744 0.06761534 -25.303554  1.666667          0   two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
2:    John   Beer -3.4320324  2.159744 0.06761534   1.970221 13.333333          0   two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
3: Michael Coffee -0.2067737  3.960582 0.84638132 -28.960658  9.666667          0   two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
4: Michael Coffee -0.2067737  3.960582 0.84638132  24.960658 11.666667          0   two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
5:  Thomas    Tee  0.2208631  2.049375 0.84525800 -12.025434  4.666667          0   two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
6:  Thomas    Tee  0.2208631  2.049375 0.84525800  13.358768  4.000000          0   two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
7:    John    Tee -1.3850647  2.070089 0.29640280 -61.453187  5.333333          0   two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
8:    John    Tee -1.3850647  2.070089 0.29640280  30.786521 20.666667          0   two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
9: Michael   Beer -0.6835859  2.210972 0.55885626 -45.015433  4.333333          0   two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
10: Michael   Beer -0.6835859  2.210972 0.55885626  31.682100 11.000000          0   two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
11:  Thomas Coffee  0.1942572  3.977345 0.85549254  -8.883193  4.000000          0   two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
12:  Thomas Coffee  0.1942572  3.977345 0.85549254  10.216527  3.333333          0   two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
13:    John Coffee -0.7570982  2.088564 0.52510317 -77.499374  6.666667          0   two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
14:    John Coffee -0.7570982  2.088564 0.52510317  53.499374 18.666667          0   two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
15: Michael    Tee -0.9049035  2.018804 0.46026242 -66.647341  2.666667          0   two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
16: Michael    Tee -0.9049035  2.018804 0.46026242  43.314008 14.333333          0   two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
17:  Thomas   Beer -0.7113284  2.110684 0.54726281 -29.270500  4.000000          0   two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
18:  Thomas   Beer -0.7113284  2.110684 0.54726281  20.603833  8.333333          0   two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
``````

This does a t.test on two groups (`consumed[version == "one"], consumed[version == "two"]`), by group (`by = .(name,drinks)`)

The reason the result has two rows is because your confidence interval + estimate returns two values. All other columns are repeated.

We can avoid this by storing the result in our data.table as a list, by wrapping in `list(...)`:

``````result <- all[, .(ttest = list(t.test(consumed[version == "one"], consumed[version == "two"]))), by = .(name,drinks)]
result
name drinks   ttest
1:    John   Beer <htest>
2: Michael Coffee <htest>
3:  Thomas    Tee <htest>
4:    John    Tee <htest>
5: Michael   Beer <htest>
6:  Thomas Coffee <htest>
7:    John Coffee <htest>
8: Michael    Tee <htest>
9:  Thomas   Beer <htest>
``````

We can then call a result with:

``````result[name == "John" & drinks == "Beer", ttest]
[[1]]

Welch Two Sample t-test

data:  consumed[version == "one"] and consumed[version == "two"]
t = -3.432, df = 2.1597, p-value = 0.06762
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-25.303554   1.970221
sample estimates:
mean of x mean of y
1.666667 13.333333
``````
Comments