leerssej leerssej - 29 days ago 9
R Question

Iterate through columns in dplyr

I am trying to iterate a function progressively through the columns of a database. What are the options available for this? Am I limited to a for loop or is there a dplyr approach or similarly intuitive code structure I can use?

Essentially I have a matrix much much larger than the one constructed below, but of the same general structure. The first column explains which version to select as the refined datacollection, and then all the data that comprises each of these three versions.

library(dplyr)

# Function: creates a matrix of random strings, v = versionNumber
matADv.maker <- function (v){
matADv <- data.frame(matrix(sample(letters[1:26], 10), nrow = 5))
colnames(matADv) <- paste0("v", v, "_", letters[24:25])
return(matADv)
}

set.seed(1)
lvl <- data.frame(c(as.integer(runif(5, 5, 8))))
colnames(lvl) <- "Level"
matADv5 <- matADv.maker(5)
matADv6 <- matADv.maker(6)
matADv7 <- matADv.maker(7)
matComp <- bind_cols(lvl, matADv5, matADv6, matADv7)
matComp
Source: local data frame [5 x 8]



rowname Level v5_x v5_y v6_x v6_y v7_x v7_y
(chr) (int) (chr) (chr) (chr) (chr) (chr) (chr)
1 1 5 x e m t k z
2 2 6 z d r e a l
3 3 6 p n x z j x
4 4 7 o g i c u d
5 5 5 b s y u h o



I was wondering if there isn't instead some simple way to shuttle a function along the dataframe.

Essentially, How might I be able to iterate through columns by using dplyr or similar simpler constructions than for loops?

Answer

Well, I now realize Hadley had already prepared the way for the best solution of all; I need to process the column transformations in bulk. I collect all the columns with the same issue (filter), resolve the issues, and then join all the frames back together in a Split, Apply, Combine approach.

More efficient too: tackling the problem as 3 tables instead of hundreds of columnar iterations testing each data point and then reacting consecutively.

library(magrittr)

matComp %<>% add_rownames
v5Mat <- matComp %>% filter(Level == 5) %>% select(rowname, starts_with("v5"))
v6Mat <- matComp %>% filter(Level == 6) %>% select(rowname, starts_with("v6"))
v7Mat <- matComp %>% filter(Level == 7) %>% select(rowname, starts_with("v7"))
colnames(v5Mat) %<>%  gsub("v\\d_", "ref", .)
colnames(v6Mat) %<>%  gsub("v\\d_", "ref", .)
colnames(v7Mat) %<>%  gsub("v\\d_", "ref", .)
refinedMat <- 
    Reduce(function(...) merge(..., all=TRUE), list(matComp, v5Mat, v6Mat, v7Mat)) %>% 
    group_by(rowname) %>% 
    summarise_each(funs(na.omit))
refinedMat

results in:

 Source: local data frame [5 x 10]

   rowname  refx  refy Level  v5_x  v5_y  v6_x  v6_y  v7_x  v7_y
     (chr) (chr) (chr) (int) (chr) (chr) (chr) (chr) (chr) (chr)
 1       1     x     e     5     x     e     m     t     k     z
 2       2     r     e     6     z     d     r     e     a     l
 3       3     x     z     6     p     n     x     z     j     x
 4       4     u     d     7     o     g     i     c     u     d
 5       5     b     s     5     b     s     y     u     h     o
Comments