Vespasian - 1 year ago 186

R Question

currently I'm using the build in function dist to calculate my distance matrix in R.

`dist(featureVector,method="manhattan")`

This is currently the bottlneck of the application and therefore the idea was to parallize this task(conceptually this should be possible)

Searching google and this forum did not succeed.

Does anybody has an idea?

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

Here's the structure for one route you could go. It is not faster than just using the `dist()`

function, instead taking many times longer. It does process in parallel, but even if the computation time were reduced to zero, the time to start up the function and export the variables to the cluster would probably be longer than just using `dist()`

```
library(parallel)
vec.array <- matrix(rnorm(2000 * 100), nrow = 2000, ncol = 100)
TaxiDistFun <- function(one.vec, whole.matrix) {
diff.matrix <- t(t(whole.matrix) - one.vec)
this.row <- apply(diff.matrix, 1, function(x) sum(abs(x)))
return(this.row)
}
cl <- makeCluster(detectCores())
clusterExport(cl, list("vec.array", "TaxiDistFun"))
system.time(dist.array <- parRapply(cl, vec.array,
function(x) TaxiDistFun(x, vec.array)))
stopCluster(cl)
dim(dist.array) <- c(2000, 2000)
```

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**