Dreamer Dreamer - 5 months ago 60
Python Question

Held out training and validation set in gridsearchcv sklearn

I see that in gridsearchcv best parameters are determined based on

cross-validation
, but what I really want to do is to determine the best parameters based on
one held out validation set
instead of
cross validation
.

Not sure if there is a way to do that. I found some similar posts where customizing the
cross-validation folds
. However, again what I really need is to train on one set and validate the parameters on a validation set.

One more information about my dataset is basically a
text series type
created by
panda
.

Answer

I did come up with an answer to my own question through the use of PredefinedSplit

for i in range(len(doc_train)-1):
    train_ind[i] = -1

for i in range(len(doc_val)-1):
    val_ind[i] = 0

ps = PredefinedSplit(test_fold=np.concatenate((train_ind,val_ind)))

and then in the gridsearchCV arguments

grid_search = GridSearchCV(pipeline, parameters, n_jobs=7, verbose=1 ,   cv=ps)