harry lakins harry lakins - 2 days ago 4
Python Question

Why is my neural network not working?

background

I have created a neural network that can be of n inputs, n hidden layers of n length, n outputs. When using it for handwriting recognition - using the Kaggle dataset (a 76mb text file of 28x28 matrix of 0-255 values for hand written numbers), the results are showing that somewhere, something must be wrong. In this case, i am using 784 inputs (each pixel 28x28), 1 hidden layer of 15 neurons, and an output layer of 10 neurons.

Output guesses are a vector like this [0,0,0,1,0,0,0,0,0,0] - which would mean its guessing a 3. This is based on this http://neuralnetworksanddeeplearning.com/chap1.html#a_simple_network_to_classify_handwritten_digits
(same principals and set up)

I am assuming my problem is somewhere within the back propagation - and because my program has a completely flexible network size in all dimensions (layers, length of layers, etc), my algorithm for back propagating is quite complex - and based on the chain rule explained here https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
Where essentially, the total error for each output is calculated with respect to each weight, and for hidden layers, the sum of the weight changes in previous layers are used.

notes about code

neuron_weights is of the structure

[
layers[neuron[weights]]
]


where weights are initially a random float.
weight_changes
is the exact same structure as weights.

neurons is of the stucture

[
layers[neuron]
]


where neuron is the activated neuron value

...so if there are three neuron layers (
neurons[0] #inputs
,
neurons[1]#hidden
,
neurons[2]#outputs
), there are two weight layers
neuron_weights[0], neuron_weights[1]


...the following back prop function gets called for every matrix in a loop elsewhere, just after it has been fed forward (which i have tested an works).So assume the neurons and weights are set and ready to be back propagated.

my back prop code

desired_list = self.get_desired_list(desired_number) #returns list of 0s and a 1 (e.g [0,0,0,0,0,1,0,0,0,0]) for comparison to output

for weight_column in range(len(self.neuron_weights)-1,-1,-1): #loop through weight columns

e_total = 0

for neuron_weight_num in range(0, len(self.neuron_weights[weight_column])): #loop backwards through each neuron in weight column (group of weights in each column)

neuron_weight_value = self.neurons[weight_column+1][neuron_weight_num]

act_to_sum_step = neuron_weight_value * (1-neuron_weight_value) #get value from sigmoid to before sigmoid


for singleweight_num in range(0, len(self.neuron_weights[weight_column][neuron_weight_num])): #loop through each single weight to update

curr_weight_value = self.neuron_weights[weight_column][neuron_weight_num][singleweight_num]

if(weight_column == len(self.neuron_weights)-1): #if output column, step back from desired values

step_back_error_value = neuron_weight_value - desired_list[neuron_weight_num-1]

e_total += (0.5*step_back_error_value)**2

else: #otherwise, sum up previous changes in previous column of weights
weight_column_to_sum = weight_column + 1

step_back_error_value = 0

for weight_change_neuron_num in range(0, len(self.weight_changes[weight_column_to_sum])):
before_change_weight = self.weight_changes[weight_column_to_sum][weight_change_neuron_num][neuron_weight_num]
step_back_error_value += before_change_weight

input_to_weight_neuron_value = self.neurons[weight_column][singleweight_num]

#derivative of activated neuron value to weight
act_to_weight_val = act_to_sum_step * input_to_weight_neuron_value

complete_step_back_value = step_back_error_value * act_to_weight_val

#save weight change value for later use (if back prop goes further back)
self.weight_changes[weight_column][neuron_weight_num][singleweight_num] = complete_step_back_value_value

#update weight value
new_w_value = curr_weight_value - (self.learn_rate * complete_step_back_value)

self.neuron_weights[weight_column][neuron_weight_num][singleweight_num] = new_w_value

print(e_total)


when using a learning rate of 0.5, e_total starts at 2.252 and within a minute gets to 0.4462, and then within 5 mins gets no lower than 0.2.

This makes me think somethings must be working. But, when i output the desired outputs and the output guesses, they rarely match, even after 5 mins of iteraton/learning. I would hope to see results like this

output layer: [0.05226,0.0262,0.03262,0.0002, 0.1352, 0.99935, 0.00, etc]
output desired: [0,0,0,0,0,1,0, etc]


(all < 0.1 except the correct guess value should be > 0.9)

but instead i get things like

output layer: [0.15826,0.0262,0.33262,0.0002, 0.1352, 0.0635, 0.00, etc]
output desired: [0,1,0,0,0,0,0, etc]


(all < 0.1, so no clear classification, let alone an accurate one.)

I even added a line of code to output 'correct' when the guess value and desired value match - and even though, as i said, the e_total decreases, 'correct' was always happening about 1 in 10 times - which is no better than random!

I have tried different hidden layer lengths, different all sorts of different learning rates - but no good.

I've given more information in comments which may help

UPDATE:

As recommend, I have used my system to try and learn XOR function - with 2 inputs, 1 hidden layer of 2 neurons, and 1 output.
meaning, the
desired_list
is now a single element array, either [1] or [0]. Output values seem to be random >0.5 and < 0.7, with no clear relation to desired output. Just to confirm, I have manually tested my feed forward and back prop many times, and they defiantly work how explained in tutorials i've linked.

Answer

You don't need to reinvent the wheel... You may use pybrain module, which provide optimized "Supervised Learning" features like Back-Propagation, R-Prop, etc... (and have also supervised learning, unsupervised learning, reinforcement learning and black-box optimization algorithm features)

You may find here an example of how to use pybrain module to make OCR with a 10×9 inputs array (just adapt to your 28x28 need)

If you definitely would to reinvent the wheel... you may do some introspection of the pybrain source code (because the back prop version of pybrain works) in order to explain/double-check why your code version is not working.

As NN debug is a difficult task, you may also publish more code and share any ressources which are relative to your code ...

Regards

Comments