zdilli zdilli - 24 days ago 6
R Question

R-Operating on subset of columns from dataframe with ddply

I have a large-ish dataframe (40000 observations of 800 variables) and wish to operate on a range of columns of every observation with something akin to dot product. This is how I implemented it:

matrixattempt <- as.matrix(dframe)
takerow <- function(k) {as.vector(matrixattempt[k,])}
takedot0 <- function(k) {sqrt(sum(data0averrow * takerow(k)[2:785]))}

for (k in 1:40000){
print(k)
dframe$dot0aver[k]<-takedot0(k)
}


The print is just to keep track of what's going on. data0averrow is a numeric vector, same size as takerow(k)[2:785], that has been pre-defined.

This is running, and from a few tests running correctly, but it is very slow.

I searched for dot product for a subset of columns, and found this question, but could not figure out how to apply it to my setup. ddply sounds like it should work faster (although I do not want to do splitting and would have to use the same define-id trick that the referenced questioner did). Any insight/hints?

Answer

Try this:

sqrt(colSums(t(matrixattempt[, 2:785])  * data0averrow))

or equivalently:

sqrt(matrixattempt[, 2:785] %*% data0averrow)