Zhilong Jia Zhilong Jia - 2 months ago 13
R Question

the difference between doMC and doParallel in R

What's the difference between

doParallel
and
doMC
in R concerning
foreach
function?
doParallel
supports windows, unix-like, while
doMC
supports unix-like only. In other words, why
doParallel
cannot replace
doMC
directly? Thank you.

Update:
doParallel
is built on
parallel
, which is essentially a merger of
multicore
and
snow
and automatically uses the appropriate tool for your system. As a result, we can use
doParallel
to support multi systems. In other words, we can use
doParallel
to replace
doMC
.

ref: http://michaeljkoontz.weebly.com/uploads/1/9/9/4/19940979/parallel.pdf

BTW, what is the difference between
registerDoParallel(ncores=3)
and

cl <- makeCluster(3)
registerDoParallel(cl)


It seems
registerDoParallel(ncores=3)
can stop cluster automatically, while the second do not stop automatically and needs
stopCluster(cl)
.

ref: http://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf

Answer

The doParallel package is a merger of doSNOW and doMC, much as parallel is a merger of snow and multicore. But although doParallel has all the features of doMC, I was told by Rich Calaway of Revolution Analytics that they wanted to keep doMC around because it was more efficient in certain circumstances, even though doMC now uses parallel just like doParallel. I haven't personally run any benchmarks to determine if and when there is a significant difference.

I tend to use doMC on a Linux or Mac OS X computer, doParallel on a Windows computer, and doMPI on a Linux cluster, but doParallel does work on all of those platforms.


As for the different registration methods, if you execute:

registerDoParallel(cores=3)

on a Windows machine, it will create a cluster object implicitly for later use with clusterApplyLB, whereas on Linux and Mac OS X, no cluster object is created or used. The number of cores is simply remembered and used as the value of the mc.cores argument later when calling mclapply.

If you execute:

cl <- makeCluster(3)
registerDoParallel(cl)

then the registered cluster object will be used with clusterApplyLB regardless of the platform. You are correct that in this case, it is your responsibility to shutdown the cluster object since you created it, whereas the implicit cluster object is automatically shutdown.