Erdem KAYA Erdem KAYA - 2 months ago 22
Python Question

How to get odds-ratios and other related features with scikit-learn

I'm going through this odds ratios in logistic regression tutorial, and trying to get the exactly the same results with the logistic regression module of scikit-learn. With the code below, I am able to get the coefficient and intercept but I could not find a way to find other properties of the model listed in the tutorial such as log-likelyhood, Odds Ratio, Std. Err., z, P>|z|, [95% Conf. Interval]. If someone could show me how to have them calculated with

sklearn
package, I would appreciate it.

import pandas as pd
from sklearn import linear_model

url = 'http://www.ats.ucla.edu/stat/mult_pkg/faq/general/sample.csv'
df = pd.read_csv(url, na_values=[''])
y = df.hon.values
X = df.math.values
y = y.reshape(200,1)
X = X.reshape(200,1)
clf = linear_model.LogisticRegression(C=1e5)
clf.fit(X,y)
clf.coef_
clf.intercept_

Answer

You can get the odds ratios by taking the exponent of the coeffecients:

import numpy as np
X = df.female.values.reshape(200,1)
clf.fit(X,y)
np.exp(clf.coef_)

# array([[ 1.80891307]])

As for the other statistics, these are not easy to get from scikit-learn (where model evaluation is mostly done using cross-validation), if you need them you're better off using a different library such as statsmodels.

Comments