Stereo Stereo - 1 month ago 32
Python Question

H2O Python: Extract grid search model that with highest AUC on validation data set

I am building a Random Forest model using a grid search with the H2O Python API. I split the data in train and validation and use k-fold cross validation to select the best model in the grid search.

I am able to retrieve the model with the best

MSE
on the training set but I want to retrieve the model with the highest
AUC
on the validation set.

I could code everything in Python but I was wondering whether there is a H2O approach to solve this. Any suggestions on how I could do this?

Answer

If g is your grid object, then:

g.sort_by('auc', False);

will give you the models ordered by AUC. The 2nd parameter of False means highest AUC will be first. It returns a H2OTwoDimTable object, so you can select the first model (the best model, by AUC) that way.

I believe it should be sorting based on scores on the validation set, not training set. However you can specify it explicitly with:

g.sort_by('auc(valid=True)', False);
Comments