David L-R - 5 months ago 15

Ruby Question

After my previous attempt, I managed to train a neural network to express the sine function. I used the ai4r Ruby gem:

`require 'ai4r'`

srand 1

net = Ai4r::NeuralNetwork::Backpropagation.new([1, 60, 1])

net.learning_rate = 0.01

#net.propagation_function = lambda { |x| 1.0 / ( 1.0 + Math::exp( -x ) ) }

def normalise(x, xmin, xmax, ymin, ymax)

xrange = xmax - xmin

yrange = ymax - ymin

return ymin + (x - xmin) * (yrange.to_f / xrange)

end

training_data = Array.new

test = Array.new

i2 = 0.0

320.times do |i|

i2 += 0.1

hash = Hash.new

output = Math.sin(i2.to_f)

input = i2.to_f

hash.store(:input,[normalise(input,0.0,32.0,0.0,1.0)])

hash.store(:expected_result,[normalise(output,-1.0,1.0,0.0,1.0)])

training_data.push(hash)

test.push([normalise(output,-1.0,1.0,0.0,1.0)])

end

puts "#{test}"

puts "#{training_data}"

time = Time.now

999999.times do |i|

error = 0.0

training_data.each do |d|

error+=net.train(d[:input], d[:expected_result])

end

if error < 0.26

break

end

print "Times: #{i}, error: #{error} \r"

end

time2 = Time.now

puts "#{time2}-#{time} = #{time2-time} Sekunden gebraucht."

serialized = Marshal.dump(net)

File.open("net.saved", "w+") { |file| file.write(serialized) }

Everything worked out fine. The network was trained in 4703.664857 seconds.

The network will be trained much faster when I normalise the input/output to a number between 0 and 1.

`ai4r`

In the sine example, is it possible to input any number as in:

`Input: -10.0 -> Output: 0.5440211108893699`

Input: 87654.322 -> Output: -0.6782453567239783

Input: -9878.923 -> Output: -0.9829544956991526

or do I have to define the range?

Answer

In your structure you have 60 hidden nodes after a single input. This means that each hidden node has only 1 learned weight for a total of 60 values learned. The connection from the hidden layer to the single output node likewise has 60 weights, or learned values. This gives a total of 120 possible learnable dimensions.

Image what each node in the hidden layer is capable of learning: there is a single scaling factor, then a non-linearity. Let's assume that your weights end up looking like:

`[1e-10, 1e-9, 1e-8, ..., .1]`

with each entry being the weight of a node in the hidden layer. Now if you pass in the number 1 to your network your hidden layer will output something to this effect:

`[0, 0, 0, 0, ..., .1, .25, .5, .75, 1]`

(roughly speaking, not actually calculated)

Likewise if you give it something large, like: 1e10 then the first layer would give:

`[0, .25, .5, .75, 1, 1, 1, ..., 1]`

.

The weights of your hidden layer are going to learn to separate in this fashion to be able to handle a large range of inputs by scaling them to a smaller range. The more hidden nodes you have (in that first layer), the less far each node has to separate. In my example they are spaced out by a factor of ten. If you had 1000's, they would be spaced out by a factor of maybe 2.

By normalizing the input range to be between [0,1], you are restricting how far those hidden nodes need to separate before they can start giving meaningful information to the final layer. This allows for faster training (assuming your stopping condition is based on change in loss).

So to directly answer your questions: No, you do not *need* to normalize, but it certainly helps speed up training by reducing the variability and size of the input space.

Source (Stackoverflow)

Comments