qed - 4 months ago 27

Python Question

Here is a quick implementation of a one-layer neural network in python:

`import numpy as np`

# simulate data

np.random.seed(94106)

X = np.random.random((200, 3)) # 100 3d vectors

# first col is set to 1

X[:, 0] = 1

def simu_out(x):

return np.sum(np.power(x, 2))

y = np.apply_along_axis(simu_out, 1, X)

# code 1 if above average

y = (y > np.mean(y)).astype("float64")*2 - 1

# split into training and testing sets

Xtr = X[:100]

Xte = X[100:]

ytr = y[:100]

yte = y[100:]

w = np.random.random(3)

# 1 layer network. Final layer has one node

# initial weights,

def epoch():

err_sum = 0

global w

for i in range(len(ytr)):

learn_rate = .1

s_l1 = Xtr[i].T.dot(w) # signal at layer 1, pre-activation

x_l1 = np.tanh(s_l1) # output at layer 1, activation

err = x_l1 - ytr[i]

err_sum += err

# see here: https://youtu.be/Ih5Mr93E-2c?t=51m8s

delta_l1 = 2 * err * (1 - x_l1**2)

dw = Xtr[i] * delta_l1

w -= learn_rate * dw

print("Mean error: %f" % (err_sum / len(ytr)))

epoch()

for i in range(1000):

epoch()

def predict(X):

global w

return np.sign(np.tanh(X.dot(w)))

# > 80% accuracy!!

np.mean(predict(Xte) == yte)

It is using stochastic gradient descent for optimization. I am thinking how do I apply mini-batch gradient descent here?

Answer

The difference from "classical" SGD to a mini-batch gradient descent is that you use multiple samples (a so-called mini-batch) to calculate the update for `w`

. This has the advantage, that the steps you take in direction of the solution are less noisy, as you follow a smoothed gradient.

To do that, you need an inner loop to calculate the update `dw`

, where you iterate over the mini batch. For example (quick-n-dirty code):

```
def epoch():
err_sum = 0
learn_rate = 0.1
global w
for i in range(int(ceil(len(ytr) / batch_size))):
batch = Xtr[i:i+batch_size]
target = ytr[i:i+batch_size]
dw = np.zeros_like(w)
for j in range(batch_size):
s_l1 = batch[j].T.dot(w)
x_l1 = np.tanh(s_l1)
err = x_l1 - target[j]
err_sum += err
delta_l1 = 2 * err * (1 - x_l1**2)
dw += batch[j] * delta_l1
w -= learn_rate * (dw / batch_size)
print("Mean error: %f" % (err_sum / len(ytr)))
```

gave an accuracy of 87 percent in a test.

Now, one more thing: you always go through the training set from start to end. You should definitely *shuffle* the data in each iteration. Always going through in the same order can really affect your performance, especially if you e.g. first have all samples of class A, and then all of class B. This can also make your training go in cycles. So just go through the set in a random order, e.g. with

```
order = np.random.permutation(len(ytr))
```

and replace all occurrences of `i`

by `order[i]`

in the `epoch()`

function.

And a more general remark: Global variables are often considered bad design, as you don't have any control over which snippet modifies your variables. Rather pass `w`

as a parameter. The same goes for the learning rate and the batch size.