Sam Hammamy - 10 months ago 169

Python Question

I am trying to understand

`backpropagation`

`MNIST`

There is the input layer with

`weights`

`bias`

`MNIST`

`10`

The second layer is a

`linear tranform`

`softmax activation`

`Backpropagation`

Previous layers appends the

`global`

`previous`

`local gradient`

`local gradient`

`softmax`

Several resources online go through the explanation of the softmax and its derivatives and even give code samples of the softmax itself

`def softmax(x):`

"""Compute the softmax of vector x."""

exps = np.exp(x)

return exps / np.sum(exps)

The derivative is explained with respect to when

`i = j`

`i != j`

`def softmax(self, x):`

"""Compute the softmax of vector x."""

exps = np.exp(x)

return exps / np.sum(exps)

def forward(self):

# self.input is a vector of length 10

# and is the output of

# (w * x) + b

self.value = self.softmax(self.input)

def backward(self):

for i in range(len(self.value)):

for j in range(len(self.input)):

if i == j:

self.gradient[i] = self.value[i] * (1-self.input[i))

else:

self.gradient[i] = -self.value[i]*self.input[j]

Then

`self.gradient`

`local gradient`

Answer Source

I am assuming you have a 3-layer NN with `W1`

, `b1`

for is associated with the linear transformation from input layer to hidden layer and `W2`

, `b2`

is associated with linear transformation from hidden layer to output layer. `Z1`

and `Z2`

are the input vector to the hidden layer and output layer. `a1`

and `a2`

represents the output of the hidden layer and output layer. `a2`

is your predicted output. `delta3`

and `delta2`

are the errors (backpropagated) and you can see the gradients of the loss function with respect to model parameters.

This is a general scenario for a 3-layer NN (input layer, only one hidden layer and one output layer). You can follow the procedure described above to compute gradients which should be easy to compute! Since another answer to this post already pointed to the problem in your code, i am not repeating the same.