Erdem KAYA - 5 months ago 72

Python Question

I'm going through this odds ratios in logistic regression tutorial, and trying to get the exactly the same results with the logistic regression module of scikit-learn. With the code below, I am able to get the coefficient and intercept but I could not find a way to find other properties of the model listed in the tutorial such as *log-likelyhood, Odds Ratio, Std. Err., z, P>|z|, [95% Conf. Interval]*. If someone could show me how to have them calculated with

`sklearn`

`import pandas as pd`

from sklearn import linear_model

url = 'http://www.ats.ucla.edu/stat/mult_pkg/faq/general/sample.csv'

df = pd.read_csv(url, na_values=[''])

y = df.hon.values

X = df.math.values

y = y.reshape(200,1)

X = X.reshape(200,1)

clf = linear_model.LogisticRegression(C=1e5)

clf.fit(X,y)

clf.coef_

clf.intercept_

Answer

You can get the odds ratios by taking the exponent of the coeffecients:

```
import numpy as np
X = df.female.values.reshape(200,1)
clf.fit(X,y)
np.exp(clf.coef_)
# array([[ 1.80891307]])
```

As for the other statistics, these are not easy to get from scikit-learn (where model evaluation is mostly done using cross-validation), if you need them you're better off using a different library such as `statsmodels`

.