NoviceCoder - 4 months ago 61

Python Question

I watned to learn about machine learning and I stumbled upon youtube siraj and his Udacity videos and wanted to try and pick up a few things.

His video in reference: https://www.youtube.com/watch?v=vOppzHpvTiQ&index=1&list=PL2-dafEMk2A7YdKv4XfKpfbTH5z6rEEj3

In his video, he had a txt file he imported and read, but when I tried to recreate the the txt file it couldnt be read in correctly. Instead, I tried to create a pandas dataframe with the same data and perform the linear regression/predict on it, but then I got the below error.

Found input variables with inconsistent numbers of samples: [1, 16] and something about passing 1d arrays and I need to reshape them.

Then when I tried to reshape them following this post: Sklearn : ValueError: Found input variables with inconsistent numbers of samples: [1, 6]

I get this error....

**shapes (1,16) and (1,1) not aligned: 16 (dim 1) != 1 (dim 0)**

This is my code down below. I know it's probably a syntax error, I'm just not familiar with this scklearn yet and would like some help.

`import pandas as pd`

import matplotlib.pyplot as plt

import numpy as np

from sklearn import linear_model

#DF = pd.read_fwf('BrainBodyWeight.txt')

DF = pd.DataFrame()

DF['Brain'] = [3.385, .480, 1.350, 465.00,36.330, 27.660, 14.830, 1.040, 4.190, 0.425, 0.101, 0.920, 1.000, 0.005, 0.060, 3.500 ]

DF['Body'] = [44.500, 15.5, 8.1, 423, 119.5, 115, 98.2, 5.5,58, 6.40, 4, 5.7,6.6, .140,1, 10.8]

try:

x = DF['Brain']

y = DF['Body']

x = x.tolist()

y = y.tolist()

x = np.asarray(x)

y = np.asarray(y)

body_reg = linear_model.LinearRegression()

body_reg.fit(x.reshape(-1,1),y.reshape(-1,1))

plt.scatter(x,y)

plt.plot(x,body_reg.predict(x))

plt.show()

except Exception as e:

print(e)

Can anyone explain why sklearn doesn't like my input????

Answer Source

From documentation LinearRegression.fit() requires an x array with `[n_samples,n_features]`

shape. So that's why you are reshaping your `x`

array before calling fit. Since if you don't you'll have an array with (16,) shape, which does not meet the required `[n_samples,n_features]`

shape, there are no `n_features`

given.

```
x = DF['Brain']
x = x.tolist()
x = np.asarray(x)
# 16 samples, None feature
x.shape
(16,)
# 16 samples, 1 feature
x.reshape(-1,1).shape
(16,1)
```

The same requirement goes for the LinearRegression.predict function (and also for consistency), you just simply need to do the same reshaping when calling the predict function.

```
plt.plot(x,body_reg.predict(x.reshape(-1,1)))
```

Or alternatively you can just reshape the `x`

array before calling any functions.

And for feature reference, you can easily get the inner numpy array of values by just calling `DF['Brain'].values`

. You don't need to cast it to list -> numpy array. So you can just use this instead of all the conversion:

```
x = DF['Brain'].values.reshape(1,-1)
y = DF['Body'].values.reshape(1,-1)
body_reg = linear_model.LinearRegression()
body_reg.fit(x, y)
```