Herbert - 2 months ago 15

Python Question

I noticed that in theano, when one creates a shared variable based on 1D numpy array, this becomes a vector, but not a row:

`import theano.tensor as T`

import theano, numpy

shared_vector = theano.shared(numpy.zeros((10,)))

print(shared_vector.type)

# TensorType(float64, vector)

print(shared_vector.broadcastable)

# (False,)

The same goes for a 1xN matrix, it becomes a matrix but not a row:

`shared_vector = theano.shared(numpy.zeros((1,10,)))`

print(shared_vector.type)

# TensorType(float64, matrix)

print(shared_vector.broadcastable)

# (False, False)

This is troublesome when I want to add a M x N matrix to a 1 X N row-vector, because the shared vector is not broadcastable in the first dimension. First of all, this will not work:

`row = T.row('row')`

mat=T.matrix('matrix')

f=theano.function(

[],

mat + row,

givens={

mat: numpy.zeros((20,10), dtype=numpy.float32),

row: numpy.zeros((10,), dtype=numpy.float32)

},

on_unused_input='ignore'

)

With the error:

`TypeError: Cannot convert Type TensorType(float32, vector) (of Variable <TensorType(float32, vector)>) into Type TensorType(float32, row). You can try to manually convert <TensorType(float32, vector)> into a TensorType(float32, row).`

Ok, that's clear, we can't assign vectors to rows. Unfortunately, this is also not fine:

`row = T.matrix('row')`

mat=T.matrix('matrix')

f=theano.function(

[],

mat + row,

givens={

mat: numpy.zeros((20,10), dtype=numpy.float32),

row: numpy.zeros((1,10,), dtype=numpy.float32)

},

on_unused_input='ignore'

)

f()

With the error:

`ValueError: Input dimension mis-match. (input[0].shape[0] = 20, input[1].shape[0] = 1)`

Apply node that caused the error: Elemwise{add,no_inplace}(<TensorType(float32, matrix)>, <TensorType(float32, matrix)>)

Inputs types: [TensorType(float32, matrix), TensorType(float32, matrix)]

Inputs shapes: [(20, 10), (1, 10)]

Inputs strides: [(40, 4), (40, 4)]

Inputs values: ['not shown', 'not shown']

Backtrace when the node is created:

File "<ipython-input-55-0f03bee478ec>", line 5, in <module>

mat + row,

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

So we can't just use a 1 x N matrix as a row as well (because the first dimension of a 1 x N matrix is not broadcastable).

The question remains, what 'can' we do? How can I create a shared variable of type row, such that is i broadcastable using matrix-row addition?

Answer

An alternative to using `reshape(1, N)`

is to use `dimshuffle('x', 0)`

as described in the documentation.

Here's a demo of the two approaches:

```
import numpy
import theano
x = theano.shared(numpy.arange(10))
print x
print x.dimshuffle('x', 0).type
print x.dimshuffle(0, 'x').type
print x.reshape((1, x.shape[0])).type
print x.reshape((x.shape[0], 1)).type
f = theano.function([], outputs=[x, x.dimshuffle('x', 0), x.reshape((1, x.shape[0]))])
theano.printing.debugprint(f)
```

This prints

```
<TensorType(int32, vector)>
TensorType(int32, row)
TensorType(int32, col)
TensorType(int32, row)
TensorType(int32, col)
DeepCopyOp [@A] '' 2
|<TensorType(int32, vector)> [@B]
DeepCopyOp [@C] '' 4
|InplaceDimShuffle{x,0} [@D] '' 1
|<TensorType(int32, vector)> [@B]
DeepCopyOp [@E] '' 6
|Reshape{2} [@F] '' 5
|<TensorType(int32, vector)> [@B]
|MakeVector{dtype='int64'} [@G] '' 3
|TensorConstant{1} [@H]
|Shape_i{0} [@I] '' 0
|<TensorType(int32, vector)> [@B]
```

Demonstrating that the `dimshuffle`

is probably preferable as it involves less work than the `reshape`

.