What is the difference between cross_val_score with scoring='roc_auc' and roc_auc_score?

I am confused about the difference between the cross_val_score scoring metric 'roc_auc' and the roc_auc_score that I can just import and call directly.

The documentation (http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter) indicates that specifying scoring='roc_auc' will use the sklearn.metrics.roc_auc_score. However, when I implement GridSearchCV or cross_val_score with scoring='roc_auc' I receive very different numbers that when I call roc_auc_score directly.

Here is my code to help demonstrate what I see:

# score the model using cross_val_score

rf = RandomForestClassifier(n_estimators=150,

scores = cross_val_score(rf, X, y, cv=3, scoring='roc_auc')

print scores
array([ 0.9649023 , 0.96242235, 0.9503313 ])

# do a train_test_split, fit the model, and score with roc_auc_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
rf.fit(X_train, y_train)

print roc_auc_score(y_test, rf.predict(X_test))
0.84634039111363313 # quite a bit different than the scores above!

I feel like I am missing something very simple here -- most likely a mistake in how I am implementing/interpreting one of the scoring metrics.

Can anyone shed any light on the reason for the discrepancy between the two scoring metrics?

This is because you supplied predicted y's instead of the probability in roc_auc_score. This function takes a score, not the classified label. Try instead to do this:

print roc_auc_score(y_test, rf.predict_proba(X_test)[:,1])

It should give a similar result to previous result from cross_val_score. Refer to this post for more info.

