abvaekvnl abvaekvnl - 3 months ago 27
R Question

How to tune hidden_dropout_ratios in h2o.grid in R

I want to tune a neural network with dropout using h2o in R. Here I provide a reproducible example for the iris dataset. I'm avoiding to tune

eta
and
epsiplon
(i.e. ADADELTA hyper-parameters) with the only purpose of making computations faster.

require(h2o)
h2o.init()
data(iris)
iris = iris[sample(1:nrow(iris)), ]
irisTrain = as.h2o(iris[1:90, ])
irisValid = as.h2o(iris[91:120, ])
irisTest = as.h2o(iris[121:150, ])
hyper_params <- list(
input_dropout_ratio = list(0, 0.15, 0.3),
hidden_dropout_ratios = list(0, 0.15, 0.3, c(0,0), c(0.15,0.15),c(0.3,0.3)),
hidden = list(64, c(32,32)))
grid = h2o.grid("deeplearning", x=colnames(iris)[1:4], y=colnames(iris)[5],
training_frame = irisTrain, validation_frame = irisValid,
hyper_params = hyper_params, adaptive_rate = TRUE,
variable_importances = TRUE, epochs = 50, stopping_rounds=5,
stopping_tolerance=0.01, activation=c("RectifierWithDropout"),
seed=1, reproducible=TRUE)


The output is:

Details: ERRR on field: _hidden_dropout_ratios: Must have 1 hidden layer dropout ratios.


The problem is in
hidden_dropout_ratios
. Note that I'm including 0 for input_dropout_ratio and hidden_dropout_ratios since I also want to test the activation function without dropout. I'm aware that I could use
activation="Rectifier
but I think that my configuration should lead to the same result. How do I tune
hidden_dropout_ratios
when tuning architectures with different numbers of layers?

Attempt 1: Unsuccessful and I'm not tuning
hidden
.

hyper_params <- list(
input_dropout_ratio = c(0, 0.15, 0.3),
hidden_dropout_ratios = list(c(0.3,0.3), c(0.5,0.5)),
hidden = c(32,32))
ERRR on field: _hidden_dropout_ratios: Must have 1 hidden layer dropout ratios.


Attempt 2: Successful but I'm not tuning
hidden
.

hyper_params <- list(
input_dropout_ratio = c(0, 0.15, 0.3),
hidden_dropout_ratios = c(0.3,0.3),
hidden = c(32,32))

Answer

You have to fix the number of hidden layers in a grid, if experimenting with hidden_dropout_ratios. At first I messed around with combining multiple grids; then, when researching for my H2O book, I saw someone mention, in passing, how grids get combined automatically if you give them the same name.

So, you still need to call h2o.grid() for each number of hidden layers, but they can all be in the same grid at the end. Here is your example modified for that:

require(h2o)
h2o.init()
data(iris)
iris = iris[sample(1:nrow(iris)), ]
irisTrain = as.h2o(iris[1:90, ])
irisValid = as.h2o(iris[91:120, ])
irisTest = as.h2o(iris[121:150, ])

hyper_params1 <- list(
    hidden_dropout_ratios = list(0, 0.15, 0.3),
    hidden = list(64)
    )

hyper_params2 <- list(
    hidden_dropout_ratios = list(c(0,0), c(0.15,0.15),c(0.3,0.3)),
    hidden = list(c(32,32))
    )

grid = h2o.grid("deeplearning", x=colnames(iris)[1:4], y=colnames(iris)[5],
    grid_id = "stackoverflow",
    training_frame = irisTrain, validation_frame = irisValid,
    hyper_params = hyper_params1, adaptive_rate = TRUE,
    variable_importances = TRUE, epochs = 50, stopping_rounds=5,
    stopping_tolerance=0.01, activation=c("RectifierWithDropout"),
    seed=1, reproducible=TRUE)

grid = h2o.grid("deeplearning", x=colnames(iris)[1:4], y=colnames(iris)[5],
    grid_id = "stackoverflow",
    training_frame = irisTrain, validation_frame = irisValid,
    hyper_params = hyper_params2, adaptive_rate = TRUE,
    variable_importances = TRUE, epochs = 50, stopping_rounds=5,
    stopping_tolerance=0.01, activation=c("RectifierWithDropout"),
    seed=1, reproducible=TRUE)

When I went to print the grid, I was reminded there is a bug with grid output when using list hyper-parameters, such as hidden or hidden_dropout_ratios. Your code is a nice self-contained example, so I'll report that now. In the meantime, here is a one-liner to show the values of the hyper-parameter corresponding to each:

sapply(models, function(m) c(
  paste(m@parameters$hidden, collapse = ","),
  paste(m@parameters$hidden_dropout_ratios, collapse=",")
  ))

Which gives:

     [,1]    [,2] [,3]        [,4]   [,5]      [,6] 
[1,] "32,32" "64" "32,32"     "64"   "32,32"   "64" 
[2,] "0,0"   "0"  "0.15,0.15" "0.15" "0.3,0.3" "0.3"

I.e. no hidden dropout is better than a little, which is better than a lot. And two hidden layers is better than one.

P.S. The only other change I made to your code is to remove input_dropout_ratio as I guessed your intention was for it to be zero.

  • input_dropout_ratio: controls dropout between input layer and the first hidden layer. Can be used independently of the activation function.
  • hidden_dropout_ratios: controls dropout between each hidden layer and the next layer (which is either the next hidden layer, or the output layer). If specified, you must specify one of the "WithDropout" activation functions.
Comments