ExtremistEnigma - 1 year ago 259

Python Question

I'm implementing a multinomial logistic regression model in Python using Scikit-learn. Here's my code:

`X = pd.concat([each for each in feature_cols], axis=1)`

y = train[["<5", "5-6", "6-7", "7-8", "8-9", "9-10"]]

lm = LogisticRegression(multi_class='multinomial', solver='lbfgs')

lm.fit(X, y)

However, I'm getting

`ValueError: bad input shape (50184, 6)`

`X`

`DataFrame`

`y`

I ultimately want to predict in what bin (<5, 5-6, etc.) the outcome falls. All the independent and dependent variables used in this case are dummy columns which have a binary value of either 0 or 1. What am I missing?

Answer

The Logistic Regression 3-class Classifier example illustrates how fitting `LogisticRegression`

uses a vector rather than a matrix input, in this case the `target`

variable of the `iris`

dataset, coded as values `[0, 1, 2]`

.

To convert the dummy matrix to a series, you could multiply each column with a different integer, and then - assuming it's a `pandas.DataFrame`

- just call `.sum(axis=1)`

on the result. Something like:

```
for i, col in enumerate(y.columns.tolist(), 1):
y.loc[:, col] *= i
y = y.sum(axis=1)
```

Source (Stackoverflow)