zdilli - 7 months ago 38

R Question

I have a large-ish dataframe (40000 observations of 800 variables) and wish to operate on a range of columns of every observation with something akin to dot product. This is how I implemented it:

`matrixattempt <- as.matrix(dframe)`

takerow <- function(k) {as.vector(matrixattempt[k,])}

takedot0 <- function(k) {sqrt(sum(data0averrow * takerow(k)[2:785]))}

for (k in 1:40000){

print(k)

dframe$dot0aver[k]<-takedot0(k)

}

The print is just to keep track of what's going on.

This is running, and from a few tests running correctly, but it is very slow.

I searched for dot product for a subset of columns, and found this question, but could not figure out how to apply it to my setup. ddply sounds like it should work faster (although I do not want to do splitting and would have to use the same define-id trick that the referenced questioner did). Any insight/hints?

Answer

Try this:

```
sqrt(colSums(t(matrixattempt[, 2:785]) * data0averrow))
```

or equivalently:

```
sqrt(matrixattempt[, 2:785] %*% data0averrow)
```

Source (Stackoverflow)