MrROY - 4 months ago 39

Python Question

I've the training data set like this:

`0.00479616 | 0.0119904 | 0.00483092 | 0.0120773 | 1`

0.51213136 | 0.0113404 | 0.02383092 | -0.012073 | 0

0.10479096 | -0.011704 | -0.0453692 | 0.0350773 | 0

The first 4 columns is features of one sample and the last column is its output.

I use scikit this way :

`data = np.array(data)`

lr = linear_model.LogisticRegression(C=10)

X = data[:,:-1]

Y = data[:,-1]

lr.fit(X, Y)

print lr

# The output is always 1 or 0, not a probability number.

print lr.predict(data[0][:-1])

I thought Logistic Regression always should gives a probability number between 0 and 1.

Answer

Use the `predict_proba`

method to get probabilities. `predict`

gives class labels.

```
>>> lr = LogisticRegression()
>>> X = np.random.randn(3, 4)
>>> y = [1, 0, 0]
>>> lr.fit(X, y)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, penalty='l2', random_state=None, tol=0.0001)
>>> lr.predict_proba(X[0])
array([[ 0.49197272, 0.50802728]])
```

(If you had read the documentation, you would have found this out.)