ExtremistEnigma ExtremistEnigma - 11 months ago 229
Python Question

Scikit-learn - Bad input shape error on multinomial logistic regression

I'm implementing a multinomial logistic regression model in Python using Scikit-learn. Here's my code:

X = pd.concat([each for each in feature_cols], axis=1)
y = train[["<5", "5-6", "6-7", "7-8", "8-9", "9-10"]]
lm = LogisticRegression(multi_class='multinomial', solver='lbfgs')
lm.fit(X, y)

However, I'm getting
ValueError: bad input shape (50184, 6)
when it tries to execute the last line of code.

is a
with 50184 rows, 7 columns.
also has 50184 rows, but 6 columns.

I ultimately want to predict in what bin (<5, 5-6, etc.) the outcome falls. All the independent and dependent variables used in this case are dummy columns which have a binary value of either 0 or 1. What am I missing?


The Logistic Regression 3-class Classifier example illustrates how fitting LogisticRegression uses a vector rather than a matrix input, in this case the target variable of the iris dataset, coded as values [0, 1, 2].

To convert the dummy matrix to a series, you could multiply each column with a different integer, and then - assuming it's a pandas.DataFrame - just call .sum(axis=1) on the result. Something like:

for i, col in enumerate(y.columns.tolist(), 1):
    y.loc[:, col] *= i
y = y.sum(axis=1)