bag - 1 year ago 62

Python Question

I'm currently writing an implementation of univariate linear regression on python:

`# implementation of univariate linear regression`

import numpy as np

def cost_function(hypothesis, y, m):

return (1 / (2 * m)) * ((hypothesis - y) ** 2).sum()

def hypothesis(X, theta):

return X.dot(theta)

def gradient_descent(X, y, theta, m, alpha):

for i in range(1500):

temp1 = theta[0][0] - alpha * (1 / m) * (hypothesis(X, theta) - y).sum()

temp2 = theta[1][0] - alpha * (1 / m) * ((hypothesis(X, theta) - y) * X[:, 1]).sum()

theta[0][0] = temp1

theta[1][0] = temp2

return theta

if __name__ == '__main__':

data = np.loadtxt('data.txt', delimiter=',')

y = data[:, 1]

m = y.size

X = np.ones(shape=(m, 2))

X[:, 1] = data[:, 0]

theta = np.zeros(shape=(2, 1))

alpha = 0.01

print(gradient_descent(X, y, theta, m, alpha))

This code will output NaN for theta, after going to infinity - I can't figure out what's going wrong, but it's surely something to do with my changing of theta in the gradient descent function.

The data I'm using is a simple linear regression pairs dataset I got online - and that loads in correctly.

Can anyone point me in the right direction?

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

The problem you're seeing is that when you do `X[:,1]`

or `data[:,1]`

, you get objects of shape (m,). When you multiply an object of shape (m,) with a matrix of shape (m,1), you get a matrix of size (m,m)

```
a = np.array([1,2,3])
b = np.array([[4],[5],[6]])
(a*b).shape #prints (3,3)
```

If you do
y=y.reshape((m,1))
in your `if __name__`

block and inside your gradient_descent function you do

```
X_1 = X[:,1].reshape((m,1))
```

Should fix the problem. Right now what's happening is that when you do

```
((hypothesis(X, theta) - y) * X[:, 1])
```

you're getting a 100 by 100 matrix, which is not what you want.

Full code I used for testing is:

```
# implementation of univariate linear regression
import numpy as np
def cost_function(hypothesis, y, m):
return (1 / (2 * m)) * ((hypothesis - y) ** 2).sum()
def hypothesis(X, theta):
return X.dot(theta)
def gradient_descent(X, y, theta, m, alpha):
X_1 = X[:,1]
X_1 = X_1.reshape((m,1))
for i in range(1500):
temp1 = theta[0][0] - alpha * (1 / m) * (hypothesis(X, theta) - y).sum()
temp2 = theta[1][0] - alpha * (1 / m) * ((hypothesis(X, theta) - y) * X_1).sum()
theta[0][0] = temp1
theta[1][0] = temp2
return theta
if __name__ == '__main__':
data= np.random.normal(size=(100,2))
y = 30*data[:,0] + data[:, 1]
m = y.size
X = np.ones(shape=(m, 2))
y = y.reshape((m,1))
X[:, 1] = data[:, 0]
theta = np.zeros(shape=(2, 1))
alpha = 0.01
print(gradient_descent(X, y, theta, m, alpha))
```

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**