ali_m ali_m - 6 months ago 35
Python Question

Why am I allowed pickle instancemethods that are Theano functions, but not normal instancemethods?

In the process of using joblib to parallelize some model-fitting code involving Theano functions, I've stumbled across some behavior that seems odd to me.

Consider this very simplified example:

from joblib import Parallel, delayed
import theano
from theano import tensor as te
import numpy as np

class TheanoModel(object):
def __init__(self):
X = te.dvector('X')
Y = (X ** te.log(X ** 2)).sum()
self.theano_get_Y = theano.function([X], Y)

def get_Y(self, x):
return self.theano_get_Y(x)

def run(niter=100):
x = np.random.randn(1000)
model = TheanoModel()
pool = Parallel(n_jobs=-1, verbose=1, pre_dispatch='all')

# this fails with `TypeError: can't pickle instancemethod objects`...
results = pool(delayed(model.get_Y)(x) for _ in xrange(niter))

# # ... but this works! Why?
# results = pool(delayed(model.theano_get_Y)(x) for _ in xrange(niter))

if __name__ == '__main__':
run()


I understand why the first case fails, since
.get_Y()
is clearly an instancemethod of
TheanoModel
. What I don't understand is why the second case works, since
X
,
Y
and
theano_get_Y()
are only declared within the
__init__()
method of
TheanoModel
.
theano_get_Y()
can't be evaluated until the
TheanoModel
instance has been created. Surely, then, it should also be considered an instancemethod, and should therefore be unpickleable? In fact, even still works if I explicitly declare
X
and
Y
to be attributes of the
TheanoModel
instance.

Can anyone explain what's going on here?




Update



Just to illustrate why I think this behaviour is particularly weird, here are a few examples of some other callable member objects that don't take
self
as the first argument:

from joblib import Parallel, delayed
import theano
from theano import tensor as te
import numpy as np

class TheanoModel(object):
def __init__(self):
X = te.dvector('X')
Y = (X ** te.log(X ** 2)).sum()
self.theano_get_Y = theano.function([X], Y)
def square(x):
return x ** 2
self.member_function = square
self.static_method = staticmethod(square)
self.lambda_function = lambda x: x ** 2

def run(niter=100):
x = np.random.randn(1000)
model = TheanoModel()
pool = Parallel(n_jobs=-1, verbose=1, pre_dispatch='all')

# # not allowed: `TypeError: can't pickle function objects`
# results = pool(delayed(model.member_function)(x) for _ in xrange(niter))

# # not allowed: `TypeError: can't pickle function objects`
# results = pool(delayed(model.lambda_function)(x) for _ in xrange(niter))

# # also not allowed: `TypeError: can't pickle staticmethod objects`
# results = pool(delayed(model.static_method)(x) for _ in xrange(niter))

# but this is totally fine!?
results = pool(delayed(model.theano_get_Y)(x) for _ in xrange(niter))

if __name__ == '__main__':
run()


None of them are pickleable with the exception of the
theano.function
!

Answer

Theano functions aren't python functions. Instead they are python objects that override __call__. This means that you can call them just like a function but internally they are really objects of some custom class. In consequence, you can pickle them.