Yangff Yangff - 1 month ago 12
Python Question

What make it so hard for a neural network to learn a classifier that class of x/256 is x?


  1. What I'm original doing is to classify some wave data using neural network. In that problem, I've a vector of about size 200 and number of classes is 256. However, the loss never goes down.

  2. So, I think, what about the wave is just the label?
    $wave_i(x) = N(i/256.0, (1/10000)^2)$
    , will labled
    i
    , N stand for normal distribution, for example.

  3. For very small classes, like 32 or 64, NN works well, and learning rapidly.

  4. When I take it to
    classes = 256
    , however, the learning speed is unbearably slow and even not learning at all.

  5. The model I'm using is pretty simple. I think this is enough to even memorized the relationship between input and output. (why? you can easily contruct a unit that output 1 when
    abs(input - const) < eps
    . )

    model = Sequential([
    Dense(classes, input_dim=200),
    Activation('sigmoid'),
    Dense(classes * 2),
    Activation('sigmoid'),
    Dense(classes),
    Activation('softmax'),
    ])


    Then, I fed it data with batch size is 256, every different labels occur once.

  6. The result is, the loss reached
    2.xxxx
    and acc reached
    0.07
    after 2500 epochs, and stopped changing after 3000 epochs. (acc around
    0.09
    to
    0.1
    )



I know more variables need more times to learn. However, it's clear that all single output cell should easily cut down their relationship between others (I have very different input set) so.

def generator():
while 1:
data = [numpy.random.normal(i/255.0,1/10000.0,225).tolist() for i in range(0, classes)]
labels = to_categorical([i for i in range(0, classes)], classes)
yield (data,labels)

Answer

When you have a really simple relationship between input and output, such as the one you are exploring, then this may not play to the strengths of a neural network, which is flexible enough to fit any function but rarely does so perfectly. When you have a simple function, you may find you will spot the imperfections in the fit from a neural network and models other than neural networks will do a better job.

Some things you could potentially do to get a better fit (roughly in order of things I would try):

  1. Try a different optimiser. You don't say which optimiser you are using, but the Keras library comes with a few choices.

  2. Neural networks work better when training and predicting against input features that have been normalised. An effective choice is mean 0, standard deviation 1. In your case, if you pre-process each batch - when training and testing - like this: data = (data - 0.5)/0.289, it may help.

  3. Increase number of neurons in hidden layers, and/or change the activation function. Your ideal activation function here might even be shaped like a gaussian (so a single neuron could immediately tune to each class), but that isn't something you usually find in a NN library. Consider dropping the middle layer too, and just have e.g. 8*classes neurons in a single hidden layer before the softmax*.

  4. Sample from your input examples in the generator instead of calculating one from each class each time. The generator is potentially too regular - I have seen the classic xor example network get stuck in a similar way to your description when fed the same inputs repeatedly.


* The simpler network model would look like this:

model = Sequential([
  Dense(classes * 8, input_dim=200), 
  Activation('sigmoid'), 
  Dense(classes), 
  Activation('softmax'), 
])
Comments