Pragyaditya Das - 1 year ago 348
Python Question

# Linear regression using Python (Pandas and Numpy)

I am trying to implement linear regression using python.

I did the following steps:

``````import pandas as p
import numpy as n
data = p.read_csv("...path\Housing.csv", usecols=[1]) # I want the first col
data1 = p.read_csv("...path\Housing.csv", usecols=[3]) # I want the 3rd col
x = data
y = data1
``````

Then I try to obtain the co-efficients, and use the following:

``````regression_coeff = n.polyfit(x,y,1)
``````

And then I get the following error:

``````raise TypeError("expected 1D vector for x")
TypeError: expected 1D vector for x
``````

I am unable to get my head around this, as when I print
`x`
and
`y`
, I can very clearly see that they are both 1D vectors.

Dataset can be found here: DataSets

The original code is:

``````import pandas as p
import numpy as n

data = pd.read_csv('...\housing.csv', usecols = [1])
data1 = pd.read_csv('...\housing.csv', usecols = [3])

x = data
y = data1
regression = n.polyfit(x, y, 1)
``````

This should work:

``````np.polyfit(data.values.flatten(), data1.values.flatten(), 1)
``````

`data` is a dataframe and its values are 2D:

``````>>> data.values.shape
(546, 1)
``````

`flatten()` turns it into 1D array:

``````>> data.values.flatten().shape
(546,)
``````

which is needed for `polyfit()`.

Simpler alternative:

``````df = pd.read_csv("Housing.csv")
np.polyfit(df['price'], df['bedrooms'], 1)
``````