lhk lhk - 23 days ago 9
Python Question

Theano: How to give training data to a neural network

I'm trying to create a simple multilayer Perceptron (MLP) for "logical and" in Theano.
There is one layer between input and output. The structure is this:

2 value input -> multiply with weights, add bias -> softmax -> 1 value output

The change in dimension is caused by the weights matrix.

The implementation is based on this tutorial: http://deeplearning.net/tutorial/logreg.html

This is my class for the Layer:

class Layer():
"""
this is a layer in the mlp
it's not meant to predict the outcome hence it does not compute a loss.
apply the functions for negative log likelihood = cost on the output of the last layer
"""

def __init__(self, input, n_in, n_out):
self.W = theano.shared(
value=numpy.zeros(
(n_in, n_out),
dtype=theano.config.floatX
),
name="W",
borrow=True
)
self.b = theano.shared(
value=numpy.zeros((n_in
, n_out),
dtype=theano.config.floatX),
name="b",
borrow=True
)

self.output = T.nnet.softmax(T.dot(input, self.W) + self.b)
self.params = (self.W, self.b)
self.input = input


The class is meant to be modular. Instead of just one layer, I want to be able to add multiple layers.
Therefore the functions for prediction, cost and errors are outside of the class (as opposed to the tutorial):

def y_pred(output):
return T.argmax(output, axis=1)


def negative_log_likelihood(output, y):
return -T.mean(T.log(output)[T.arange(y.shape[0]), y])


def errors(output, y):
# check if y has same dimension of y_pred
if y.ndim != y_pred(output).ndim:
raise TypeError(
'y should have the same shape as self.y_pred',
('y', y.type, 'y_pred', y_pred(output).type)
)
# check if y is of the correct datatype
if y.dtype.startswith('int'):
# the T.neq operator returns a vector of 0s and 1s, where 1
# represents a mistake in prediction
return T.mean(T.neq(y_pred(output), y))
else:
raise NotImplementedError()


The logical and has 4 trainingcases:


  • [0,0] -> 0

  • [1,0] -> 0

  • [0,1] -> 0

  • [1,1] -> 1



Here are the setup of the classifier and the functions for training and evaluating:

data_x = numpy.matrix([[0, 0],
[1, 0],
[0, 1],
[1, 1]])

data_y = numpy.array([0,
0,
0,
1])

train_set_x = theano.shared(numpy.asarray(data_x,
dtype=theano.config.floatX),
borrow=True)

train_set_y = T.cast(theano.shared(numpy.asarray(data_y,
dtype=theano.config.floatX),
borrow=True),"int32")

x = T.vector("x",theano.config.floatX) # data
y = T.ivector("y") # labels

classifier = Layer(input=x, n_in=2, n_out=1)

cost = negative_log_likelihood(classifier.output, y)

g_W = T.grad(cost=cost, wrt=classifier.W)
g_b = T.grad(cost=cost, wrt=classifier.b)
index = T.lscalar()

learning_rate = 0.15

updates = [
(classifier.W, classifier.W - learning_rate * g_W),
(classifier.b, classifier.b - learning_rate * g_b)
]

train_model = theano.function(
inputs=[index],
outputs=cost,
updates=updates,
givens={
x: train_set_x[index],
y: train_set_y[index]
}
)
validate_model = theano.function(
inputs=[index],
outputs=classifier.errors(y),
givens={
x: train_set_x[index],
y: train_set_y[index]
}
)


I tried to follow the conventions. Each row in the data matrix is a training sample. Each training sample is matched to the correct output.
Unfortunately the code breaks. I can't interpret the error message. What did I do wrong ?
Error:


TypeError: Cannot convert Type TensorType(int32, scalar) (of Variable Subtensor{int64}.0) into Type TensorType(int32, vector). You can try to manually convert Subtensor{int64}.0 into a TensorType(int32, vector).


This error occurs deep in the Theano code. The conflicting line in my program is:

train_model = theano.function(
inputs=[index],
outputs=cost,
updates=updates,
givens={
x: train_set_x[index],
y: train_set_y[index] # <---------------HERE
}
)


Apparently there is a mismatch between the dimensions of y and the training data.
My complete code on pastebin: http://pastebin.com/U5jYitk2
The complete error message on pastebin: http://pastebin.com/hUQJhfNM

Concise question:
What is the correct way to give training data to an mlp in theano ?
Where is my mistake ?

I copied most of the code of the tutorial. Notable changes (probable causes of the error) are:


  • training data for y is not a matrix. I think this is right, because the output of my network is just a scalar value

  • input of the first layer is a vector. This variable is named x.

  • Access of the training data does not use slicing. In the tutorial the training data is very complex and I find it hard to read the data access code. I believe that x should be a row of the data-matrix. This is how I implemented it.



UPDATE:
I used the code of Amir. Looks very good, thank you.

But it creates an error, too. The last loop is out-of-bounds:


/usr/bin/python3.4 /home/lhk/programming/sk/mlp/mlp/Layer.py Traceback
(most recent call last): File
"/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py",
line 595, in call
outputs = self.fn() ValueError: y_i value out of bounds

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File
"/home/lhk/programming/sk/mlp/mlp/Layer.py", line 113, in
train_model(i) File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py",
line 606, in call
storage_map=self.fn.storage_map) File "/usr/local/lib/python3.4/dist-packages/theano/gof/link.py", line 206,
in raise_with_op
raise exc_type(exc_value).with_traceback(exc_trace) File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py",
line 595, in call
outputs = self.fn() ValueError: y_i value out of bounds Apply node that caused the error: CrossentropySoftmaxArgmax1HotWithBias(Dot22.0,
b, Elemwise{Cast{int32}}.0) Inputs types: [TensorType(float64,
matrix), TensorType(float64, vector), TensorType(int32, vector)]
Inputs shapes: [(1, 1), (1,), (1,)] Inputs strides: [(8, 8), (8,),
(4,)] Inputs values: [array([[ 0.]]), array([ 0.]), array([1],
dtype=int32)]

HINT: Re-running with most Theano optimization disabled could give you
a back-trace of when this node was created. This can be done with by
setting the Theano flag 'optimizer=fast_compile'. If that does not
work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint
and storage map footprint of this apply node.


Line 113 is this one:

#train the model
for i in range(train_set_x.shape[0].eval()):
train_model(i) # <-----------------HERE


I believe this is because the indexing of the training data uses
index
and
index+1
. Why is that necessary ? One row should be one training sample. And one row is
train_set_x[index]


Edit: I debugged the code. Without slicing it returns a 1d-array, with slicing it's 2d. 1d should be incompatible to the matrix x.

But while I did this, I found another strange problem:
I added this code to look at the effect of the training:

print("before")
print(classifier.W.get_value())
print(classifier.b.get_value())

for i in range(3):
train_model(i)

print("after")
print(classifier.W.get_value())
print(classifier.b.get_value())

before
[[ 0.]
[ 0.]]
[ 0.]
after
[[ 0.]
[ 0.]]
[ 0.]


This makes sense, since the first three samples have 0 as correct output.
If I change the order and move the training sample (1,1),1 to the front, the program crashes.


before [[ 0.] [ 0.]] [ 0.] Traceback (most recent call last): File
"/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py",
line 595, in call
outputs = self.fn() ValueError: y_i value out of bounds

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File
"/home/lhk/programming/sk/mlp/mlp/Layer.py", line 121, in
train_model(i) File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py",
line 606, in call
storage_map=self.fn.storage_map) File "/usr/local/lib/python3.4/dist-packages/theano/gof/link.py", line 206,
in raise_with_op
raise exc_type(exc_value).with_traceback(exc_trace) File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py",
line 595, in call
outputs = self.fn() ValueError: y_i value out of bounds Apply node that caused the error: CrossentropySoftmaxArgmax1HotWithBias(Dot22.0,
b, Elemwise{Cast{int32}}.0) Inputs types: [TensorType(float64,
matrix), TensorType(float64, vector), TensorType(int32, vector)]
Inputs shapes: [(1, 1), (1,), (1,)] Inputs strides: [(8, 8), (8,),
(4,)] Inputs values: [array([[ 0.]]), array([ 0.]), array([1],
dtype=int32)]

HINT: Re-running with most Theano optimization disabled could give you
a back-trace of when this node was created. This can be done with by
setting the Theano flag 'optimizer=fast_compile'. If that does not
work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint
and storage map footprint of this apply node.


UPDATE

I installed Python2.7 with Theano and tried to run the code again. The same error occurs. And I added verbose exception handling. Here is the output:

/usr/bin/python2.7 /home/lhk/programming/sk/mlp/mlp/Layer.py
Traceback (most recent call last):
File "/home/lhk/programming/sk/mlp/mlp/Layer.py", line 113, in <module>
train_model(i)
File "/home/lhk/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 595, in __call__
outputs = self.fn()
File "/home/lhk/.local/lib/python2.7/site-packages/theano/gof/link.py", line 485, in streamline_default_f
raise_with_op(node, thunk)
File "/home/lhk/.local/lib/python2.7/site-packages/theano/gof/link.py", line 481, in streamline_default_f
thunk()
File "/home/lhk/.local/lib/python2.7/site-packages/theano/gof/op.py", line 768, in rval
r = p(n, [x[0] for x in i], o)
File "/home/lhk/.local/lib/python2.7/site-packages/theano/tensor/nnet/nnet.py", line 896, in perform
nll[i] = -row[y_idx[i]] + m + numpy.log(sum_j)
IndexError: index 1 is out of bounds for axis 0 with size 1
Apply node that caused the error: CrossentropySoftmaxArgmax1HotWithBias(Dot22.0, b, Subtensor{int32:int32:}.0)
Inputs types: [TensorType(float64, matrix), TensorType(float64, vector), TensorType(int32, vector)]
Inputs shapes: [(1, 1), (1,), (1,)]
Inputs strides: [(8, 8), (8,), (4,)]
Inputs values: [array([[ 0.]]), array([ 0.]), array([1], dtype=int32)]

Debugprint of the apply node:
CrossentropySoftmaxArgmax1HotWithBias.0 [@A] <TensorType(float64, vector)> ''
|Dot22 [@B] <TensorType(float64, matrix)> ''
| |Subtensor{int32:int32:} [@C] <TensorType(float64, matrix)> ''
| | |<TensorType(float64, matrix)> [@D] <TensorType(float64, matrix)>
| | |ScalarFromTensor [@E] <int32> ''
| | | |<TensorType(int32, scalar)> [@F] <TensorType(int32, scalar)>
| | |ScalarFromTensor [@G] <int32> ''
| | |Elemwise{add,no_inplace} [@H] <TensorType(int32, scalar)> ''
| | |<TensorType(int32, scalar)> [@F] <TensorType(int32, scalar)>
| | |TensorConstant{1} [@I] <TensorType(int8, scalar)>
| |W [@J] <TensorType(float64, matrix)>
|b [@K] <TensorType(float64, vector)>
|Subtensor{int32:int32:} [@L] <TensorType(int32, vector)> ''
|Elemwise{Cast{int32}} [@M] <TensorType(int32, vector)> ''
| |<TensorType(float64, vector)> [@N] <TensorType(float64, vector)>
|ScalarFromTensor [@E] <int32> ''
|ScalarFromTensor [@G] <int32> ''
CrossentropySoftmaxArgmax1HotWithBias.1 [@A] <TensorType(float64, matrix)> ''
CrossentropySoftmaxArgmax1HotWithBias.2 [@A] <TensorType(int32, vector)> ''

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.

Process finished with exit code 1


UPDATE:

I looked at the training data again. Any sample with 1 as label will produce the above error.

data_y = numpy.array([1,
1,
1,
1])


The above sample labels will crash for every train_model(i) for i in (0,1,2,3).
Apparently there is an interference between indexing the samples and the sample content.

UPDATE:
The problem is indeed, like Amir's contact indicated, the dimension of the output layer. I had the misconception that I could train the network to encode the output of the function "logical and" directly in the output neuron. While this is certainly possible, this training approach uses the y value indexing to choose the output node which should have the highest value. After changing the output size to two, the code works. And with enough training, the errors for all cases indeed become zero.

Answer

Here is the working code for your problem. There were quite a few little bugs in your code. The one causing the error you were getting was due to defining b as a n_in by n_out matrix instead of simply defining it as an 'n_out' vector. The update part was defined in brackets [] as opposed to parenthesis ().

Also, the index was defined as an int32 symbolic scalar (this is not very important). The other import change was to define the functions given the correct indexing. The way you had used the index to compile your functions would not let the function to compile for some reason. You had also declared your input as a vector. This way, you will not be able to train the model using mini-batches or full batch. So it's safe to declare it as a symbolic matrix. And to use a vector, you need to store your inputs as vectors instead of a matrix on the shared variable to make the program run. So, there will be such a headache declaring it as a vector. In the end, you had use classifier.errors(y) to compile your validation function although you had removed the function errors from the Layer class.

import theano
import theano.tensor as T
import numpy


class Layer(object):
    """
    this is a layer in the mlp
    it's not meant to predict the outcome hence it does not compute a loss.
    apply the functions for negative log likelihood = cost on the output of the last layer
    """

    def __init__(self, input, n_in, n_out):
        self.x = input
        self.W = theano.shared(
                value=numpy.zeros(
                        (n_in, n_out),
                        dtype=theano.config.floatX
                ),
                name="W",
                borrow=True
        )
        self.b = theano.shared(
                value=numpy.zeros(n_out,
                                  dtype=theano.config.floatX),
                name="b",
                borrow=True
        )

        self.output = T.nnet.softmax(T.dot(self.x, self.W) + self.b)
        self.params = [self.W, self.b]
        self.input = input


def y_pred(output):
    return T.argmax(output, axis=1)


def negative_log_likelihood(output, y):
    return -T.mean(T.log(output)[T.arange(y.shape[0]), y])


def errors(output, y):
    # check if y has same dimension of y_pred
    if y.ndim != y_pred(output).ndim:
        raise TypeError(
                'y should have the same shape as self.y_pred',
                ('y', y.type, 'y_pred', y_pred(output).type)
        )
    # check if y is of the correct datatype
    if y.dtype.startswith('int'):
        # the T.neq operator returns a vector of 0s and 1s, where 1
        # represents a mistake in prediction
        return T.mean(T.neq(y_pred(output), y))
    else:
        raise NotImplementedError()

data_x = numpy.matrix([[0, 0],
                       [1, 0],
                       [0, 1],
                       [1, 1]])

data_y = numpy.array([0,
                      0,
                      0,
                      1])

train_set_x = theano.shared(numpy.asarray(data_x,
                         dtype=theano.config.floatX),
                         borrow=True)

train_set_y = T.cast(theano.shared(numpy.asarray(data_y,
                         dtype=theano.config.floatX),
                         borrow=True),"int32")

x = T.matrix("x")  # data
y = T.ivector("y")  # labels

classifier = Layer(input=x, n_in=2, n_out=1)

cost = negative_log_likelihood(classifier.output, y)

g_W = T.grad(cost=cost, wrt=classifier.W)
g_b = T.grad(cost=cost, wrt=classifier.b)
index = T.iscalar()

learning_rate = 0.15

updates = (
    (classifier.W, classifier.W - learning_rate * g_W),
    (classifier.b, classifier.b - learning_rate * g_b)
)

train_model = theano.function(
        inputs=[index],
        outputs=cost,
        updates=updates,
        givens={
            x: train_set_x[index:index + 1],
            y: train_set_y[index:index + 1]
        }
)
validate_model = theano.function(
        inputs=[index],
        outputs=errors(classifier.output, y),
        givens={
            x: train_set_x[index:index + 1],
            y: train_set_y[index:index + 1]
        }
)

#train the model
for i in range(train_set_x.shape[0].eval()):
    train_model(i)

Here's the updated code. Please note that the main difference between the code above and the code below is that the later is suitable for a binary problem while the other only works if you have a multi-class problem, which is not the case here. The reason I putting both code snippets here is for educational purposes. Please read the comments to get a sense of the problem with the code above and how I went about resolving it.

import theano
import theano.tensor as T
import numpy


class Layer(object):
    """
    this is a layer in the mlp
    it's not meant to predict the outcome hence it does not compute a loss.
    apply the functions for negative log likelihood = cost on the output of the last layer
    """

    def __init__(self, input, n_in, n_out):
        self.x = input
        self.W = theano.shared(
                value=numpy.zeros(
                        (n_in, n_out),
                        dtype=theano.config.floatX
                ),
                name="W",
                borrow=True
        )
        self.b = theano.shared(
                value=numpy.zeros(n_out,
                                  dtype=theano.config.floatX),
                name="b",
                borrow=True
        )

        self.output = T.reshape(T.nnet.sigmoid(T.dot(self.x, self.W) + self.b), (input.shape[0],))
        self.params = [self.W, self.b]
        self.input = input


def y_pred(output):
    return output


def negative_log_likelihood(output, y):
    return T.mean(T.nnet.binary_crossentropy(output,y))


def errors(output, y):
    # check if y has same dimension of y_pred
    if y.ndim != y_pred(output).ndim:
        raise TypeError(
                'y should have the same shape as self.y_pred',
                ('y', y.type, 'y_pred', y_pred(output).type)
        )
    # check if y is of the correct datatype
    if y.dtype.startswith('int'):
        # the T.neq operator returns a vector of 0s and 1s, where 1
        # represents a mistake in prediction
        return T.mean(T.neq(y_pred(output), y))
    else:
        raise NotImplementedError()

data_x = numpy.matrix([[0, 0],
                       [1, 0],
                       [0, 1],
                       [1, 1]])

data_y = numpy.array([0,
                      0,
                      0,
                      1])

train_set_x = theano.shared(numpy.asarray(data_x,
                         dtype=theano.config.floatX),
                         borrow=True)

train_set_y = T.cast(theano.shared(numpy.asarray(data_y,
                         dtype=theano.config.floatX),
                         borrow=True),"int32")

x = T.matrix("x")  # data
y = T.ivector("y")  # labels

classifier = Layer(input=x, n_in=2, n_out=1)

cost = negative_log_likelihood(classifier.output, y)

g_W = T.grad(cost=cost, wrt=classifier.W)
g_b = T.grad(cost=cost, wrt=classifier.b)
index = T.iscalar()

learning_rate = 0.15

updates = (
    (classifier.W, classifier.W - learning_rate * g_W),
    (classifier.b, classifier.b - learning_rate * g_b)
)

train_model = theano.function(
        inputs=[index],
        outputs=cost,
        updates=updates,
        givens={
            x: train_set_x[index:index+1],
            y: train_set_y[index:index+1]
        }
)
validate_model = theano.function(
        inputs=[index],
        outputs=errors(classifier.output, y),
        givens={
            x: train_set_x[index:index + 1],
            y: train_set_y[index:index + 1]
        }
)

#train the model
for i in range(train_set_x.shape[0].eval()):
    train_model(i)