lhk - 6 months ago 47

Python Question

I'm trying to create a simple multilayer Perceptron (MLP) for "logical and" in Theano.

There is one layer between input and output. The structure is this:

2 value input -> multiply with weights, add bias -> softmax -> 1 value output

The change in dimension is caused by the weights matrix.

The implementation is based on this tutorial: http://deeplearning.net/tutorial/logreg.html

This is my class for the Layer:

`class Layer():`

"""

this is a layer in the mlp

it's not meant to predict the outcome hence it does not compute a loss.

apply the functions for negative log likelihood = cost on the output of the last layer

"""

def __init__(self, input, n_in, n_out):

self.W = theano.shared(

value=numpy.zeros(

(n_in, n_out),

dtype=theano.config.floatX

),

name="W",

borrow=True

)

self.b = theano.shared(

value=numpy.zeros((n_in

, n_out),

dtype=theano.config.floatX),

name="b",

borrow=True

)

self.output = T.nnet.softmax(T.dot(input, self.W) + self.b)

self.params = (self.W, self.b)

self.input = input

The class is meant to be modular. Instead of just one layer, I want to be able to add multiple layers.

Therefore the functions for prediction, cost and errors are outside of the class (as opposed to the tutorial):

`def y_pred(output):`

return T.argmax(output, axis=1)

def negative_log_likelihood(output, y):

return -T.mean(T.log(output)[T.arange(y.shape[0]), y])

def errors(output, y):

# check if y has same dimension of y_pred

if y.ndim != y_pred(output).ndim:

raise TypeError(

'y should have the same shape as self.y_pred',

('y', y.type, 'y_pred', y_pred(output).type)

)

# check if y is of the correct datatype

if y.dtype.startswith('int'):

# the T.neq operator returns a vector of 0s and 1s, where 1

# represents a mistake in prediction

return T.mean(T.neq(y_pred(output), y))

else:

raise NotImplementedError()

The logical and has 4 trainingcases:

- [0,0] -> 0
- [1,0] -> 0
- [0,1] -> 0
- [1,1] -> 1

Here are the setup of the classifier and the functions for training and evaluating:

`data_x = numpy.matrix([[0, 0],`

[1, 0],

[0, 1],

[1, 1]])

data_y = numpy.array([0,

0,

0,

1])

train_set_x = theano.shared(numpy.asarray(data_x,

dtype=theano.config.floatX),

borrow=True)

train_set_y = T.cast(theano.shared(numpy.asarray(data_y,

dtype=theano.config.floatX),

borrow=True),"int32")

x = T.vector("x",theano.config.floatX) # data

y = T.ivector("y") # labels

classifier = Layer(input=x, n_in=2, n_out=1)

cost = negative_log_likelihood(classifier.output, y)

g_W = T.grad(cost=cost, wrt=classifier.W)

g_b = T.grad(cost=cost, wrt=classifier.b)

index = T.lscalar()

learning_rate = 0.15

updates = [

(classifier.W, classifier.W - learning_rate * g_W),

(classifier.b, classifier.b - learning_rate * g_b)

]

train_model = theano.function(

inputs=[index],

outputs=cost,

updates=updates,

givens={

x: train_set_x[index],

y: train_set_y[index]

}

)

validate_model = theano.function(

inputs=[index],

outputs=classifier.errors(y),

givens={

x: train_set_x[index],

y: train_set_y[index]

}

)

I tried to follow the conventions. Each row in the data matrix is a training sample. Each training sample is matched to the correct output.

Unfortunately the code breaks. I can't interpret the error message. What did I do wrong ?

Error:

TypeError: Cannot convert Type TensorType(int32, scalar) (of Variable Subtensor{int64}.0) into Type TensorType(int32, vector). You can try to manually convert Subtensor{int64}.0 into a TensorType(int32, vector).

This error occurs deep in the Theano code. The conflicting line in my program is:

`train_model = theano.function(`

inputs=[index],

outputs=cost,

updates=updates,

givens={

x: train_set_x[index],

y: train_set_y[index] # <---------------HERE

}

)

Apparently there is a mismatch between the dimensions of y and the training data.

My complete code on pastebin: http://pastebin.com/U5jYitk2

The complete error message on pastebin: http://pastebin.com/hUQJhfNM

What is the correct way to give training data to an mlp in theano ?

Where is my mistake ?

I copied most of the code of the tutorial. Notable changes (probable causes of the error) are:

- training data for y is not a matrix. I think this is right, because the output of my network is just a scalar value
- input of the first layer is a vector. This variable is named x.
- Access of the training data does not use slicing. In the tutorial the training data is very complex and I find it hard to read the data access code. I believe that x should be a row of the data-matrix. This is how I implemented it.

I used the code of Amir. Looks very good, thank you.

But it creates an error, too. The last loop is out-of-bounds:

/usr/bin/python3.4 /home/lhk/programming/sk/mlp/mlp/Layer.py Traceback

(most recent call last): File

"/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py",

line 595, incall

outputs = self.fn() ValueError: y_i value out of bounds

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File

"/home/lhk/programming/sk/mlp/mlp/Layer.py", line 113, in

train_model(i) File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py",

line 606, incall

storage_map=self.fn.storage_map) File "/usr/local/lib/python3.4/dist-packages/theano/gof/link.py", line 206,

in raise_with_op

raise exc_type(exc_value).with_traceback(exc_trace) File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py",

line 595, incall

outputs = self.fn() ValueError: y_i value out of bounds Apply node that caused the error: CrossentropySoftmaxArgmax1HotWithBias(Dot22.0,

b, Elemwise{Cast{int32}}.0) Inputs types: [TensorType(float64,

matrix), TensorType(float64, vector), TensorType(int32, vector)]

Inputs shapes: [(1, 1), (1,), (1,)] Inputs strides: [(8, 8), (8,),

(4,)] Inputs values: [array([[ 0.]]), array([ 0.]), array([1],

dtype=int32)]

HINT: Re-running with most Theano optimization disabled could give you

a back-trace of when this node was created. This can be done with by

setting the Theano flag 'optimizer=fast_compile'. If that does not

work, Theano optimizations can be disabled with 'optimizer=None'.

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint

and storage map footprint of this apply node.

Line 113 is this one:

`#train the model`

for i in range(train_set_x.shape[0].eval()):

train_model(i) # <-----------------HERE

I believe this is because the indexing of the training data uses

`index`

`index+1`

`train_set_x[index]`

Edit: I debugged the code. Without slicing it returns a 1d-array, with slicing it's 2d. 1d should be incompatible to the matrix x.

But while I did this, I found another strange problem:

I added this code to look at the effect of the training:

`print("before")`

print(classifier.W.get_value())

print(classifier.b.get_value())

for i in range(3):

train_model(i)

print("after")

print(classifier.W.get_value())

print(classifier.b.get_value())

before

[[ 0.]

[ 0.]]

[ 0.]

after

[[ 0.]

[ 0.]]

[ 0.]

This makes sense, since the first three samples have 0 as correct output.

If I change the order and move the training sample (1,1),1 to the front, the program crashes.

before [[ 0.] [ 0.]] [ 0.] Traceback (most recent call last): File

"/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py",

line 595, incall

outputs = self.fn() ValueError: y_i value out of bounds

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File

"/home/lhk/programming/sk/mlp/mlp/Layer.py", line 121, in

train_model(i) File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py",

line 606, incall

storage_map=self.fn.storage_map) File "/usr/local/lib/python3.4/dist-packages/theano/gof/link.py", line 206,

in raise_with_op

raise exc_type(exc_value).with_traceback(exc_trace) File "/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py",

line 595, incall

outputs = self.fn() ValueError: y_i value out of bounds Apply node that caused the error: CrossentropySoftmaxArgmax1HotWithBias(Dot22.0,

b, Elemwise{Cast{int32}}.0) Inputs types: [TensorType(float64,

matrix), TensorType(float64, vector), TensorType(int32, vector)]

Inputs shapes: [(1, 1), (1,), (1,)] Inputs strides: [(8, 8), (8,),

(4,)] Inputs values: [array([[ 0.]]), array([ 0.]), array([1],

dtype=int32)]

HINT: Re-running with most Theano optimization disabled could give you

a back-trace of when this node was created. This can be done with by

setting the Theano flag 'optimizer=fast_compile'. If that does not

work, Theano optimizations can be disabled with 'optimizer=None'.

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint

and storage map footprint of this apply node.

I installed Python2.7 with Theano and tried to run the code again. The same error occurs. And I added verbose exception handling. Here is the output:

`/usr/bin/python2.7 /home/lhk/programming/sk/mlp/mlp/Layer.py`

Traceback (most recent call last):

File "/home/lhk/programming/sk/mlp/mlp/Layer.py", line 113, in <module>

train_model(i)

File "/home/lhk/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 595, in __call__

outputs = self.fn()

File "/home/lhk/.local/lib/python2.7/site-packages/theano/gof/link.py", line 485, in streamline_default_f

raise_with_op(node, thunk)

File "/home/lhk/.local/lib/python2.7/site-packages/theano/gof/link.py", line 481, in streamline_default_f

thunk()

File "/home/lhk/.local/lib/python2.7/site-packages/theano/gof/op.py", line 768, in rval

r = p(n, [x[0] for x in i], o)

File "/home/lhk/.local/lib/python2.7/site-packages/theano/tensor/nnet/nnet.py", line 896, in perform

nll[i] = -row[y_idx[i]] + m + numpy.log(sum_j)

IndexError: index 1 is out of bounds for axis 0 with size 1

Apply node that caused the error: CrossentropySoftmaxArgmax1HotWithBias(Dot22.0, b, Subtensor{int32:int32:}.0)

Inputs types: [TensorType(float64, matrix), TensorType(float64, vector), TensorType(int32, vector)]

Inputs shapes: [(1, 1), (1,), (1,)]

Inputs strides: [(8, 8), (8,), (4,)]

Inputs values: [array([[ 0.]]), array([ 0.]), array([1], dtype=int32)]

Debugprint of the apply node:

CrossentropySoftmaxArgmax1HotWithBias.0 [@A] <TensorType(float64, vector)> ''

|Dot22 [@B] <TensorType(float64, matrix)> ''

| |Subtensor{int32:int32:} [@C] <TensorType(float64, matrix)> ''

| | |<TensorType(float64, matrix)> [@D] <TensorType(float64, matrix)>

| | |ScalarFromTensor [@E] <int32> ''

| | | |<TensorType(int32, scalar)> [@F] <TensorType(int32, scalar)>

| | |ScalarFromTensor [@G] <int32> ''

| | |Elemwise{add,no_inplace} [@H] <TensorType(int32, scalar)> ''

| | |<TensorType(int32, scalar)> [@F] <TensorType(int32, scalar)>

| | |TensorConstant{1} [@I] <TensorType(int8, scalar)>

| |W [@J] <TensorType(float64, matrix)>

|b [@K] <TensorType(float64, vector)>

|Subtensor{int32:int32:} [@L] <TensorType(int32, vector)> ''

|Elemwise{Cast{int32}} [@M] <TensorType(int32, vector)> ''

| |<TensorType(float64, vector)> [@N] <TensorType(float64, vector)>

|ScalarFromTensor [@E] <int32> ''

|ScalarFromTensor [@G] <int32> ''

CrossentropySoftmaxArgmax1HotWithBias.1 [@A] <TensorType(float64, matrix)> ''

CrossentropySoftmaxArgmax1HotWithBias.2 [@A] <TensorType(int32, vector)> ''

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.

Process finished with exit code 1

I looked at the training data again. Any sample with 1 as label will produce the above error.

`data_y = numpy.array([1,`

1,

1,

1])

The above sample labels will crash for every train_model(i) for i in (0,1,2,3).

Apparently there is an interference between indexing the samples and the sample content.

The problem is indeed, like Amir's contact indicated, the dimension of the output layer. I had the misconception that I could train the network to encode the output of the function "logical and" directly in the output neuron. While this is certainly possible, this training approach uses the y value indexing to choose the output node which should have the highest value. After changing the output size to two, the code works. And with enough training, the errors for all cases indeed become zero.

Answer

Here is the working code for your problem. There were quite a few little bugs in your code. The one causing the error you were getting was due to defining `b`

as a `n_in`

by `n_out`

matrix instead of simply defining it as an 'n_out' vector. The update part was defined in brackets `[]`

as opposed to parenthesis `()`

.

Also, the index was defined as an `int32`

symbolic scalar (this is not very important). The other import change was to define the functions given the correct indexing. The way you had used the `index`

to compile your functions would not let the function to compile for some reason. You had also declared your input as a vector. This way, you will not be able to train the model using mini-batches or full batch. So it's safe to declare it as a symbolic matrix. And to use a vector, you need to store your inputs as vectors instead of a matrix on the shared variable to make the program run. So, there will be such a headache declaring it as a vector. In the end, you had use `classifier.errors(y)`

to compile your validation function although you had removed the function `errors`

from the `Layer`

class.

```
import theano
import theano.tensor as T
import numpy
class Layer(object):
"""
this is a layer in the mlp
it's not meant to predict the outcome hence it does not compute a loss.
apply the functions for negative log likelihood = cost on the output of the last layer
"""
def __init__(self, input, n_in, n_out):
self.x = input
self.W = theano.shared(
value=numpy.zeros(
(n_in, n_out),
dtype=theano.config.floatX
),
name="W",
borrow=True
)
self.b = theano.shared(
value=numpy.zeros(n_out,
dtype=theano.config.floatX),
name="b",
borrow=True
)
self.output = T.nnet.softmax(T.dot(self.x, self.W) + self.b)
self.params = [self.W, self.b]
self.input = input
def y_pred(output):
return T.argmax(output, axis=1)
def negative_log_likelihood(output, y):
return -T.mean(T.log(output)[T.arange(y.shape[0]), y])
def errors(output, y):
# check if y has same dimension of y_pred
if y.ndim != y_pred(output).ndim:
raise TypeError(
'y should have the same shape as self.y_pred',
('y', y.type, 'y_pred', y_pred(output).type)
)
# check if y is of the correct datatype
if y.dtype.startswith('int'):
# the T.neq operator returns a vector of 0s and 1s, where 1
# represents a mistake in prediction
return T.mean(T.neq(y_pred(output), y))
else:
raise NotImplementedError()
data_x = numpy.matrix([[0, 0],
[1, 0],
[0, 1],
[1, 1]])
data_y = numpy.array([0,
0,
0,
1])
train_set_x = theano.shared(numpy.asarray(data_x,
dtype=theano.config.floatX),
borrow=True)
train_set_y = T.cast(theano.shared(numpy.asarray(data_y,
dtype=theano.config.floatX),
borrow=True),"int32")
x = T.matrix("x") # data
y = T.ivector("y") # labels
classifier = Layer(input=x, n_in=2, n_out=1)
cost = negative_log_likelihood(classifier.output, y)
g_W = T.grad(cost=cost, wrt=classifier.W)
g_b = T.grad(cost=cost, wrt=classifier.b)
index = T.iscalar()
learning_rate = 0.15
updates = (
(classifier.W, classifier.W - learning_rate * g_W),
(classifier.b, classifier.b - learning_rate * g_b)
)
train_model = theano.function(
inputs=[index],
outputs=cost,
updates=updates,
givens={
x: train_set_x[index:index + 1],
y: train_set_y[index:index + 1]
}
)
validate_model = theano.function(
inputs=[index],
outputs=errors(classifier.output, y),
givens={
x: train_set_x[index:index + 1],
y: train_set_y[index:index + 1]
}
)
#train the model
for i in range(train_set_x.shape[0].eval()):
train_model(i)
```

**Here's the updated code**. Please note that the main difference between the code above and the code below is that the later is suitable for a binary problem while the other only works if you have a multi-class problem, which is not the case here. The reason I putting both code snippets here is for educational purposes. Please read the comments to get a sense of the problem with the code above and how I went about resolving it.

```
import theano
import theano.tensor as T
import numpy
class Layer(object):
"""
this is a layer in the mlp
it's not meant to predict the outcome hence it does not compute a loss.
apply the functions for negative log likelihood = cost on the output of the last layer
"""
def __init__(self, input, n_in, n_out):
self.x = input
self.W = theano.shared(
value=numpy.zeros(
(n_in, n_out),
dtype=theano.config.floatX
),
name="W",
borrow=True
)
self.b = theano.shared(
value=numpy.zeros(n_out,
dtype=theano.config.floatX),
name="b",
borrow=True
)
self.output = T.reshape(T.nnet.sigmoid(T.dot(self.x, self.W) + self.b), (input.shape[0],))
self.params = [self.W, self.b]
self.input = input
def y_pred(output):
return output
def negative_log_likelihood(output, y):
return T.mean(T.nnet.binary_crossentropy(output,y))
def errors(output, y):
# check if y has same dimension of y_pred
if y.ndim != y_pred(output).ndim:
raise TypeError(
'y should have the same shape as self.y_pred',
('y', y.type, 'y_pred', y_pred(output).type)
)
# check if y is of the correct datatype
if y.dtype.startswith('int'):
# the T.neq operator returns a vector of 0s and 1s, where 1
# represents a mistake in prediction
return T.mean(T.neq(y_pred(output), y))
else:
raise NotImplementedError()
data_x = numpy.matrix([[0, 0],
[1, 0],
[0, 1],
[1, 1]])
data_y = numpy.array([0,
0,
0,
1])
train_set_x = theano.shared(numpy.asarray(data_x,
dtype=theano.config.floatX),
borrow=True)
train_set_y = T.cast(theano.shared(numpy.asarray(data_y,
dtype=theano.config.floatX),
borrow=True),"int32")
x = T.matrix("x") # data
y = T.ivector("y") # labels
classifier = Layer(input=x, n_in=2, n_out=1)
cost = negative_log_likelihood(classifier.output, y)
g_W = T.grad(cost=cost, wrt=classifier.W)
g_b = T.grad(cost=cost, wrt=classifier.b)
index = T.iscalar()
learning_rate = 0.15
updates = (
(classifier.W, classifier.W - learning_rate * g_W),
(classifier.b, classifier.b - learning_rate * g_b)
)
train_model = theano.function(
inputs=[index],
outputs=cost,
updates=updates,
givens={
x: train_set_x[index:index+1],
y: train_set_y[index:index+1]
}
)
validate_model = theano.function(
inputs=[index],
outputs=errors(classifier.output, y),
givens={
x: train_set_x[index:index + 1],
y: train_set_y[index:index + 1]
}
)
#train the model
for i in range(train_set_x.shape[0].eval()):
train_model(i)
```