Vespasian Vespasian - 1 year ago 186
R Question

Parallel distance Matrix in R

currently I'm using the build in function dist to calculate my distance matrix in R.


This is currently the bottlneck of the application and therefore the idea was to parallize this task(conceptually this should be possible)

Searching google and this forum did not succeed.

Does anybody has an idea?

Answer Source

Here's the structure for one route you could go. It is not faster than just using the dist() function, instead taking many times longer. It does process in parallel, but even if the computation time were reduced to zero, the time to start up the function and export the variables to the cluster would probably be longer than just using dist()


vec.array <- matrix(rnorm(2000 * 100), nrow = 2000, ncol = 100)

TaxiDistFun <- function(one.vec, whole.matrix) {
    diff.matrix <- t(t(whole.matrix) - one.vec)
    this.row <- apply(diff.matrix, 1, function(x) sum(abs(x)))

cl <- makeCluster(detectCores())
clusterExport(cl, list("vec.array", "TaxiDistFun"))

system.time(dist.array <- parRapply(cl, vec.array,
                        function(x) TaxiDistFun(x, vec.array)))


dim(dist.array) <- c(2000, 2000)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download