lemonTea lemonTea - 8 months ago 63
R Question

Consisten results with Multiple runs of h2o deeplearning

For a certain combination of parameters in the deeplearning function of h2o, I get different results each time I run it.

args <- list(
list(hidden = c(200,200,200), loss = "CrossEntropy", hidden_dropout_ratio = c(0.1, 0.1,0.1), activation = "RectifierWithDropout", epochs=EPOCHS))

run <- function(extra_params) {
model <- do.call(h2o.deeplearning, modifyList(list(x=columns, y=c("Response"), validation_frame =validation, distribution = "multinomial",
l1 = 1e-5,balance_classes = TRUE,training_frame=training), extra_params))

model <- lapply(args, run)

What would I need to do in order to get consistent results for the model each time I run this?

Answer Source

Deeplearning with H2O will not be reproducible if it is run on more than a single core. The results and performance metrics may vary slightly from what you see each time you train the deep learning model. The implementation in H2O uses a technique called "Hogwild!" which increases the speed of training at the cost of reproducibility on multiple cores.

So if you want reproducible results you will need to restrict H2O to run on a single core and make sure to use a seed in the h2o.deeplearning call.

More information on "Hogwild!"