Alex Alex - 1 year ago 82
R Question

Vectorise this R loop and custom function

I have a

with scores in named culumns, I need to get
of certain clusters of columns as per an index file that defines which
need to be grouped. I'd like to do this simultaneously, as it is currently done in a loop that passes in the current 'cluster' to work on. See below.

I have two data frames, one is an index file with the following (plus much more, this is obv just for example)

index <- data.frame(area=c("area1","area1","area1","area2","area2","area2","area1",
"area1","area4","area5"), name=c(paste0("name",sample(6,10,replace=T))))

The other is a data file, again here is an impoverished example

data <- data.frame(name1=sample(10,5),name2=sample(10,5),name3=sample(10,5),

I made a function that returns the
for the columns of the 'data' df that make up an area according to the 'index' df

myfun <- function ( {
target.cols <- as.character(index$name[index$])

I then use the function to get row means for areas by looping through the areas.

for (i in seq_along(unique(index$area))){
data[,as.character(unique(index$area))[i]] <- myfun(as.character(unique(index$area))[i])

I'm beating myself trying to think of how to do this in one line (once the function is written) but just can't put my finger on it. Any suggestions?

Answer Source

We can split the 'name' column in 'index' by 'area', then loop through the list, subset the 'data' based on the 'name' column in 'index' and get the rowMeans

sapply(split(as.character(index$name), index$area), function(x) rowMeans(data[x]))
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download