Dylan Siegler Dylan Siegler - 24 days ago 4x
Java Question

How Precise Does an Activation Function Need to Be and How Large Will Its Inputs Be?

I am writing a basic neural network in Java and I am writing the activation functions (currently I have just written the sigmoid function). I am trying to use

s (as apposed to
) with hopes that training will actually take a reasonable amount of time. However, I've noticed that the function doesn't work with larger inputs. Currently my function is:

public static double sigmoid(double t){

return (1 / (1 + Math.pow(Math.E, -t)));


This function returns pretty precise values all the way down to when
t = -100
, but when
t >= 37
the function returns
. In a typical neural network when the input is normalized is this fine? Will a neuron ever get inputs summing over ~37? If the size of the sum of inputs fed into the activation function vary from NN to NN, what are some of the factors the affect it? Also, is there any way to make this function more precise? Is there an alternative that is more precise and/or faster?


The surprising answer is that double is actually more precision than you need. This blog article by Pete Warden claims that even 8 bits are enough precision. And not just an academic idea: NVidia's new Pascal chips emphasize their single-precision performance above everything else, because that is what matters for deep learning training.

You should be normalizing your input neuron values. If extreme values still happen, it is fine to set them to -1 or +1. In fact, this answer shows doing that explicitly. (Other answers on that question are also interesting - the suggestion to just pre-calculate 100 or so values, and not use Math.exp() or Math.pow() at all!)