Bill Ancalagon the black Bill Ancalagon the black - 2 months ago 12
Python Question

boxplot (from seaborn) would not plot as expected

The boxplot would not plot as expected.
This is what it actually plotted:

enter image description here

This is what it is supposed to plot:
enter image description here

This is the code and data:

from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import cross_val_score
scores = []
for ne in range(1,41): ## ne is the number of trees
clf = RandomForestClassifier(n_estimators = ne)
score_list = cross_val_score(clf, X, Y, cv=10)
scores.append(score_list)
sns.boxplot(scores) # scores are list of arrays
plt.xlabel('Number of trees')
plt.ylabel('Classification score')
plt.title('Classification score as a function of the number of trees')
plt.show()

scores =

[array([ 0.8757764 , 0.86335404, 0.75625 , 0.85 , 0.86875 ,
0.81875 , 0.79375 , 0.79245283, 0.8490566 , 0.85534591]),
array([ 0.89440994, 0.8447205 , 0.79375 , 0.85 , 0.8625 ,
0.85625 , 0.86875 , 0.88050314, 0.86792453, 0.8427673 ]),
array([ 0.91304348, 0.9068323 , 0.83125 , 0.84375 , 0.8875 ,
0.875 , 0.825 , 0.83647799, 0.83647799, 0.87421384]),
array([ 0.86956522, 0.86956522, 0.85 , 0.875 , 0.88125 ,
0.86875 , 0.8625 , 0.8490566 , 0.86792453, 0.89308176]),


....]

Answer

I would first create pandas DF out of scores:

import pandas as pd

In [15]: scores
Out[15]:
[array([ 0.8757764 ,  0.86335404,  0.75625   ,  0.85      ,  0.86875   ,  0.81875   ,  0.79375   ,  0.79245283,  0.8490566 ,  0.85534591]),
 array([ 0.89440994,  0.8447205 ,  0.79375   ,  0.85      ,  0.8625    ,  0.85625   ,  0.86875   ,  0.88050314,  0.86792453,  0.8427673 ]),
 array([ 0.91304348,  0.9068323 ,  0.83125   ,  0.84375   ,  0.8875    ,  0.875     ,  0.825     ,  0.83647799,  0.83647799,  0.87421384]),
 array([ 0.86956522,  0.86956522,  0.85      ,  0.875     ,  0.88125   ,  0.86875   ,  0.8625    ,  0.8490566 ,  0.86792453,  0.89308176])]

In [16]: df = pd.DataFrame(scores)

In [17]: df
Out[17]:
          0         1        2        3        4        5        6         7         8         9
0  0.875776  0.863354  0.75625  0.85000  0.86875  0.81875  0.79375  0.792453  0.849057  0.855346
1  0.894410  0.844720  0.79375  0.85000  0.86250  0.85625  0.86875  0.880503  0.867925  0.842767
2  0.913043  0.906832  0.83125  0.84375  0.88750  0.87500  0.82500  0.836478  0.836478  0.874214
3  0.869565  0.869565  0.85000  0.87500  0.88125  0.86875  0.86250  0.849057  0.867925  0.893082

now we can easily plot boxplots:

In [18]: sns.boxplot(data=df)
Out[18]: <matplotlib.axes._subplots.AxesSubplot at 0xd121128>

enter image description here