Oli Paul Oli Paul - 2 months ago 23
R Question

mclapply vs parLapply speeds

I'm running on Linux and used

mclapply
easily. I run into some errors with
parlapply
, even after using
clusterEvalQ
.

Before I go further to resolve the issue, is there any point, i.e. could there be a significant speed difference between the two or do people just use
parLapply
when on Windows?

I've read about
parLapplyLB
and can see the uses of this approach, but if I'm strictly looking at
mclapply
and
parlapply
does the FORK approach and PSOCK approach vary much in speed?

The nature of my function may determine the answer; it is using
stri_extract
.

Answer

Some quick benchmarks suggest that mclapply could be slightly faster, but this probably depends on the specific system and problem. The more balanced the jobs and the slower the actual tasks the less it should matter, which function you use.

library(parallel)
library(microbenchmark)

microbenchmark(
  parLapply = {cl <- makeCluster(2)
  parLapply(cl, rep(1:7, 3), function(x) {set.seed(1); rnorm(10^x)})
  stopCluster(cl)},
  mclapply = {mclapply(rep(1:7 , 3), function(x) {set.seed(1); rnorm(10^x)}, mc.cores = 2)},
  times = 10
)

#Unit: seconds
#     expr     min      lq     mean   median       uq      max neval
#parLapply 1.85548 2.04397 3.332970 3.071284 4.323514 6.294364    10
#mclapply  1.62610 1.65288 2.217407 1.849594 2.243418 5.435189    10


microbenchmark(
  parLapply = {cl <- makeCluster(2)
  parLapply(cl, rep(6, 20), function(x) {set.seed(1); rnorm(10^x)})
  stopCluster(cl)},
  mclapply = {mclapply(rep(6, 20), function(x) {set.seed(1); rnorm(10^x)}, mc.cores = 2)},
  times = 10
)

#Unit: milliseconds
#     expr      min        lq      mean   median       uq      max neval
#parLapply 1150.657 1188.9750 1705.1364 1242.739 2071.276 3785.516    10
# mclapply  820.692  932.2262  994.4404 1000.402 1079.930 1117.863    10

sessionInfo()
#R version 3.3.1 (2016-06-21)
#Platform: x86_64-pc-linux-gnu (64-bit)
#Running under: Ubuntu 14.04.5 LTS
#
#locale:
# [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C               LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8    
# [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8    LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
# [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
#
#attached base packages:
#[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     
#
#other attached packages:
#[1] microbenchmark_1.4-2.1 doParallel_1.0.10      iterators_1.0.8        foreach_1.4.3         
#
#loaded via a namespace (and not attached):
# [1] colorspace_1.2-6 scales_0.4.0     plyr_1.8.4       tools_3.3.1      gtable_0.2.0     Rcpp_0.12.4     
# [7] ggplot2_2.1.0    codetools_0.2-14 grid_3.3.1       munsell_0.4.3   
Comments