drgxfs drgxfs - 2 months ago 59
R Question

Parallel processing with xgboost and caret

I want to parallelize the model fitting process for xgboost while using caret. From what I have seen in xgboost's documentation, the

nthread
parameter controls the number of threads to use while fitting the models, in the sense of, building the trees in a parallel way. Caret's
train
function will perform parallelization in the sense of, for example, running a process for each iteration in a k-fold CV. Is this understanding correct, if yes, is it better to:


  1. Register the number of cores (for example, with the
    doMC
    package and the
    registerDoMC
    function), set
    nthread=1
    via caret's train function so it passes that parameter to xgboost, set
    allowParallel=TRUE
    in
    trainControl
    , and let
    caret
    handle the parallelization for the cross-validation; or

  2. Disable caret parallelization (
    allowParallel=FALSE
    and no parallel back-end registration) and set
    nthread
    to the number of physical cores, so the parallelization is contained exclusively within xgboost.



Or is there no "better" way to perform the parallelization?


Edit: I ran the code suggested by @topepo, with
tuneLength = 10
and
search="random"
, and specifying
nthread=1
on the last line (otherwise I understand that xgboost will use multithreading). There are the results I got:

xgb_par[3]
elapsed
283.691
just_seq[3]
elapsed
276.704
mc_par[3]
elapsed
89.074
just_seq[3]/mc_par[3]
elapsed
3.106451
just_seq[3]/xgb_par[3]
elapsed
0.9753711
xgb_par[3]/mc_par[3]
elapsed
3.184891


At the end, it turned out that both for my data and for this test case, letting caret handle the parallelization was a better choice in terms of runtime.

Answer

It is not simple to project what the best strategy would be. My (biased) thought is that you should parallelize the process that takes the longest. Here, that would be the resampling loop since an open thread/worker would invoke the model many times. The opposite approach of parallelizing the model fit will start and stop workers repeatedly and theoretically slows things down. Your mileage may vary.

I don't have OpenMP installed but there is code below to test (if you could report your results, that would be helpful).

library(caret)
library(plyr)
library(xgboost)
library(doMC)

foo <- function(...) {
  set.seed(2)
  mod <- train(Class ~ ., data = dat, 
               method = "xgbTree", tuneLength = 50,
               ..., trControl = trainControl(search = "random"))
  invisible(mod)
}

set.seed(1)
dat <- twoClassSim(1000)

just_seq <- system.time(foo())


## I don't have OpenMP installed
xgb_par <- system.time(foo(nthread = 5))

registerDoMC(cores=5)
mc_par <- system.time(foo())

My results (without OpenMP)

> just_seq[3]
elapsed 
326.422 
> xgb_par[3]
elapsed 
319.862 
> mc_par[3]
elapsed 
102.329 
> 
> ## Speedups
> xgb_par[3]/mc_par[3]
elapsed 
3.12582 
> just_seq[3]/mc_par[3]
 elapsed 
3.189927 
> just_seq[3]/xgb_par[3]
 elapsed 
1.020509