Jack Sullivan Jack Sullivan - 1 month ago 21
Python Question

Simple Logistic Regression Error in Python

Here is the line of code. I know the issue is that I only have a 1-d array but I cannot figure the code for casting it to a 2-d array inline.

def classification_model(model, data, predictors, outcome):
model.fit(data[predictors],data[outcome])


where data is a 1-d array that has been read from a .csv file.

The
classification_model()
is invoked like this:
classification_model(LogisticRegression(), data, 'HvA', 'FTR')

Where FTR and HvA are column names in the .csv and therefore array positions in my data array (Pandas)

Trace is:
Traceback (most recent call last):

File "Predict.py", line 112, in <module>
classification_model(LogisticRegression(), reader, 'HvA', 'FTR')
File "Predict.py", line 15, in classification_model
model.fit(data[predictors],data[outcome])
File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/logistic.py", line 1174, in fit
order="C")
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 531, in check_X_y
check_consistent_length(X, y)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 181, in check_consistent_length
" samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [1, 370]


The heading line and first line of data from .csv file

FTHG FTAG FTR HTHG HTAG HTR HS AS HST AST HF AF HC AC HY AY HR AR VCH VCD VCA Bb1X2 BbMxH BbAvH BbMxD BbAvD BbMxA BbAvA BbOU BbMx>2.5 BbAv>2.5 BbMx<2.5 BbAv<2.5 BbAH BbAHh BbMxAHH BbAvAHH BbMxAHA BbAvAHA PSCH PSCD PSCA HvA

0 0 1 0 0 1 25 10 5 2 19 11 7 2 3 3 0 1 3.4 3.5 2.25 39 3.5 3.26 3.6 3.42 2.3 2.2 37 1.95 1.86 2.02 1.92 24 0.25 2.02 1.95 1.94 1.9 3.22 3.5 2.36 0


Thanks

Answer
data[col_name].values.reshape(len(data), 1)

As given by Michael K above

Comments