ben_aaron ben_aaron - 3 months ago 12
Python Question

Make console-friendly string a useable pandas dataframe python

A quick question as I'm currently changing from R to pandas for some projects:

I get the following print output from

metrics.classification_report
from
sci-kit learn
:

precision recall f1-score support

0 0.67 0.67 0.67 3
1 0.50 1.00 0.67 1
2 1.00 0.80 0.89 5

avg / total 0.83 0.78 0.79 9


I want to use this (and similar ones) as a matrix/dataframe so, that I could subset it to extract, say the precision of class 0.

In R, I'd give the first "column" a name like 'outcome_class' and then subset it:
my_dataframe[my_dataframe$class_outcome == 1, 'precision']


And I can do this in pandas but the
dataframe
that I want to use is simply a string see sckikit's doc

How can I make the table output here to a useable dataframe in pandas?

Answer

Assign it to a variable, s:

s = classification_report(y_true, y_pred, target_names=target_names)

Or directly:

s = '''
             precision    recall  f1-score   support

    class 0       0.50      1.00      0.67         1
    class 1       0.00      0.00      0.00         1
    class 2       1.00      0.67      0.80         3

avg / total       0.70      0.60      0.61         5
'''

Use that as the string input for StringIO:

import io  # For Python 2.x use import StringIO
df = pd.read_table(io.StringIO(s), sep='\s{2,}')  # For Python 2.x use StringIO.StringIO(s)
df
Out: 
             precision  recall  f1-score  support
class 0            0.5    1.00      0.67        1
class 1            0.0    0.00      0.00        1
class 2            1.0    0.67      0.80        3
avg / total        0.7    0.60      0.61        5

Now you can slice it like an R data.frame:

df.loc['class 2']['f1-score']
Out: 0.80000000000000004

Here, classes are the index of the DataFrame. You can use reset_index() if you want to use it as a regular column:

df = df.reset_index().rename(columns={'index': 'outcome_class'})
df.loc[df['outcome_class']=='class 1', 'support']
Out: 
1    1
Name: support, dtype: int64
Comments