DrPineapple DrPineapple - 1 month ago 4
R Question

Apply function to each cell across multiple dataframes in R

Say that I have N identical (same number of rows and columns) dataframes:

set.seed(2)
df1 <- data.frame(replicate(100,rnorm(100)))
df2 <- data.frame(replicate(100,rnorm(100)))
dfN <- data.frame(replicate(100,rnorm(100)))


And I want to apply a function (in this case
t.test()
) across each "cell" of N dataframes so that what returns is a separate dataframe that contains a t value for each cell test performed. Essentially, I want to take the first cell of each dataframe,

one <- df1[1,1]
two <- df2[1,1]
Nth <- dfN[1,1]


Perform a
t.test()
on those cells,

first.cell.each <- cbind.data.frame(one,two,Nth)
t.test(first.cell.each, mu=0)


And repeat that across all cells (in this case 10000).

edit: clarified

Answer

We can create a matrix to store the output of p.value of t.test having the same dimensions of the individual datasets. Then, loop through the sequence of rows and columns, extract the elements from each of the datasets, concatenate, and do the t.test and assign the output to the same row/column index of 'res'.

res <- matrix(, ncol=100, nrow=100)
for(i in seq_len(nrow(df1))){
 for(j in seq_len(ncol(df1))){
  res[i,j] <- t.test(c(df1[i,j], df2[i,j], dfN[i,j]), mu = 0)$p.value

 }}

My code also returns a 100*100 matrix

str(res)
#num [1:100, 1:100] 0.629 0.5 0.131 0.769 0.348 ...

If there are many datasets, we can place it in a list, then convert it to an array and do the t.test using apply

lst <-  mget(paste0("df", c(1, 2, "N")))
ar1 <- array(unlist(lst), dim = c(dim(df1), length(lst)))
res2 <-  apply(aperm(ar1, c(3, 1, 2)), c(2,3), FUN = function(x) t.test(x, mu = 0)$p.value) 
str(res2)
# num [1:100, 1:100] 0.629 0.5 0.131 0.769 0.348 ...
Comments