Alex Spedding - 4 months ago 21

R Question

I'm finding it hard to put what I want into words so I will try to run through an example to explain it. Let's say I've repeated an experiment twice and have two tables:

`[df1] [df2]`

X Y X Y

2 3 4 1

5 2 2 4

These tables are stored in a list (where the list can contain more than two elements if necessary), and what I want to do is create an average of each cell in the tables across the list (or for a generalised version, apply any function I choose to the cells i.e. mad, sd, etc)

`[df1] [df2] [dfMeans]`

X Y X Y X Y

2 3 4 1 mean(2,4) mean(3,1)

5 2 2 4 mean(5,2) mean(2,4)

I have a code solution to my problem, but since this is in R there is most likely a cleaner way to do things:

`df1 <- data.frame(X=c(2,3,4),Y=c(3,2,1))`

df2 <- data.frame(X=c(5,1,3),Y=c(4,1,4))

df3 <- data.frame(X=c(2,7,4),Y=c(1,7,6))

dfList <- list(df1,df2,df3)

dfMeans <- data.frame(MeanX=c(NA,NA,NA),MeanY=c(NA,NA,NA))

for (rowIndex in 1:nrow(df1)) {

for (colIndex in 1:ncol(df1)) {

valuesAtCell <- c()

for (tableIndex in 1:length(dfList)) {

valuesAtCell <- c(valuesAtCell, dfList[[tableIndex]][rowIndex,colIndex])

}

dfMeans[rowIndex, colIndex] <- mean(valuesAtCell)

}

}

print(dfMeans)

Answer

Here is a `data.table`

solution where the mean is applied row-wise across the data frames:

```
library(data.table)
dtList <- rbindlist(dfList, use.names = TRUE, idcol = TRUE)
dtList
.id X Y
1: 1 2 3
2: 1 3 2
3: 1 4 1
4: 2 5 4
5: 2 1 1
6: 2 3 4
7: 3 2 1
8: 3 7 7
9: 3 4 6
dtList[, rn := 1:.N, by = .id][][, .(X = mean(X), Y = mean(Y)), by = rn]
rn X Y
1: 1 3.000000 2.666667
2: 2 3.666667 3.333333
3: 3 3.666667 3.666667
```

You can replace the `mean`

by another aggregation function, eg, `median`

. The `.id`

column numbers the original data frames each row was sourced from.

**Edit**

The solution can be extended to an *arbitrary* number of columns (provided column names and column order are identical in all data frames):

```
cn <- colnames(df1)
cn
[1] "X" "Y"
dtList[, rn := 1:.N, by = .id][, lapply(.SD, mean), by = rn, .SDcols = cn][, rn := NULL][]
X Y
1: 3.000000 2.666667
2: 3.666667 3.333333
3: 3.666667 3.666667
```

The column names are taken from one of the original data frames which adds to the flexibility of the solution. `[, rn := NULL]`

removes the row numbers from the result, `[]`

ensures the result ist printed.