Paul van Oppen Paul van Oppen - 22 days ago 7
R Question

Iterate through two lists with apply functions

I have a problem where I have a list of data frames where each column of the data frames has a name in the first row and x-s at some locations in the columns. If there is an x, then the name in the first row isviewed as selected.
In the real world problem I read an xlsx file with many sheets where each sheet contains a large matrix: each column has a name in the first row and many x-s in a somewhat sparse matrix. Each sheet becomes a data frame in a list of data frames. The row names contain an identifier which is relevant to the lookup but not to my issue as described here.

data1 <- data.frame(Col1 = c("Mark", "x", "", "x", "", ""),
Col2 = c("Paul", "", "", "", "x", ""),
Col3 = c("Jane", "", "", "", "", ""),
Col4 = c("Mary", "x", "x", "x", "", ""),
Col5 = c("Peter", "x", "x", "x", "", ""),
stringsAsFactors = FALSE)

data2 <- data.frame(Col1 = c("Mark", "x", "x", "", "", ""),
Col2 = c("Paul", "", "", "", "", ""),
Col3 = c("Jane", "", "", "", "", ""),
Col4 = c("Mary", "x", "", "x", "", ""),
Col5 = c("Peter", "x", "x", "", "", ""),
stringsAsFactors = FALSE)

data <- list(data1 = data1, data2 = data2)


Each data frame in the list has the following structure (shown as a matrix for convenience) where the names are the same for each data frame in the list. Only the x-s are different:

> as.matrix(data1)
Col1 Col2 Col3 Col4 Col5
[1,] "Mark" "Paul" "Jane" "Mary" "Peter"
[2,] "x" "" "" "x" "x"
[3,] "" "" "" "x" "x"
[4,] "x" "" "" "x" "x"
[5,] "" "x" "" "" ""
[6,] "" "" "" "" ""


I would like to add one column ("Approvers") to each data frame in the list that is the concatenation of the names in row 1 if there is an 'x' in the column as follows:

Col1 Col2 Col3 Col4 Col5 Approvers
[1,] "Mark" "Paul" "Jane" "Mary" "Peter" ""
[2,] "x" "" "" "x" "x" "Mark; Mary; Peter"
[3,] "" "" "" "x" "x" "Mary; Peter"
[4,] "x" "" "" "x" "x" "Mark; Mary; Peter"
[5,] "" "x" "" "" "" "Paul"
[6,] "" "" "" "" "" ""


At the moment I resolve this in two steps:


  1. I create another list of lists that holds the column positions of each x

  2. In a nested for loop I look up all the names in the first row and concatenate them.



The code is as follows:

position <- lapply(data, function(x) apply(x, 1, function(y) which(y %in% "x")))
position <- lapply(position, function(x) lapply(x, function(y) {if (length(y) == 0L) return(0) else return(y)})) # remove int(0) and replace with 0
position <- lapply(position, function(x) lapply(x, function(x) paste(x, collapse = ","))) # flatten second level list into string


for (i in 1:length(data)) {
for (j in 1:nrow(data[[i]])) {
if (as.numeric(unlist(strsplit(position[[i]][[j]], ",")))[[1]] == 0) {
data[[i]][j, "Approvers"] <- ""
} else {
data[[i]][j, "Approvers"] <- paste(data[[i]][1, as.numeric(unlist(strsplit(position[[i]][[j]], ",")))], collapse = "; ")
}
}
}


To me this is clumsy and I would like to do this using lapply and mapply by looping through both lists simultaneously but I cannot figure out how to do this. Also, creating the position object and collapsing the column index of the x-s into a string and seperating them in the loop is overly complicated.

Answer Source

We can use lapply to loop over the list then with apply loop over the rows and paste the elements of first row together where the value is x:

res <- lapply(data, function(x) {
       x$Approvers <- apply(x, 1, FUN = function(y) paste(x[1,][y =="x"], collapse=";"))
       x})
res
#$data1
#  Col1 Col2 Col3 Col4  Col5       Approvers
#1 Mark Paul Jane Mary Peter                
#2    x              x     x Mark;Mary;Peter
#3                   x     x      Mary;Peter
#4    x              x     x Mark;Mary;Peter
#5         x                            Paul
#6                                          

#$data2
#  Col1 Col2 Col3 Col4  Col5       Approvers
#1 Mark Paul Jane Mary Peter                
#2    x              x     x Mark;Mary;Peter
#3    x                    x      Mark;Peter
#4                   x                  Mary
#5                                          
#6                                          

NOTE: It seems like the names of the datasset should be `Mark', 'Paul' etc. instead of 'Col1', 'Col2',..