OliasailO OliasailO - 1 month ago 7
Python Question

A Faster Nested Tuple to List and Back

I'm trying to perform tuple to list and list to tuple conversions on nested sequences of unknown depth and shape. The calls are being made hundreds of thousands of times, which is why I'm trying to squeeze out as much speed as possible.

Any help is much appreciated.

Here's what I have so far...

def listify(self, seq, was, toBe):
temp = []
a = temp.append
for g in seq:
if type(g) == was:
a(self.listify(g, was, toBe))
return toBe(temp)

And a call for tuple to list would look like this:

self.listify((...), tuple, list)

Edit: Yeah, I totally missed both the enumerate (from an old implementation) and forgot to type the else part.

Thanks for the help both of you. I'll probably go with the coroutines.

Answer Source

I have been working with coroutines quiet a lot lately. The advantage would be that you reduce the overhead of the method calls. Sending a new value into a coroutine is faster than calling a function. While you can not make a recursive coroutine, it will throw a ValueError: generator already executing but you could make a pool of coroutine workers - you need one worker for every level of the tree. I have made some test code that works, but have not looked at the timing issues yet.

def coroutine(func):
    """ A helper function decorator from Beazley"""
    def start(*args, **kwargs):
        g = func(*args, **kwargs)
        return g
    return start

def cotuple2list():
    """This does the work"""
    result = None
    while True:
        (tup, co_pool) = (yield result)
        result = list(tup)
        # I don't like using append. So I am changing the data in place.
        for (i,x) in enumerate(result):
            # consider using "if hasattr(x,'__iter__')"
            if isinstance(x,tuple):
                result[i] = co_pool[0].send((x, co_pool[1:]))

def colist2tuple():
    """This does the work"""
    result = None
    while True:
        (lst, co_pool) = (yield result)
        # I don't like using append so I am changing the data in place...
        for (i,x) in enumerate(lst):
            # consider using "if hasattr(x,'__iter__')"
            if isinstance(x,list):
                lst[i] = co_pool[0].send((x, co_pool[1:]))
        result = tuple(lst)

Pure python alternative from HYRY's post:

def list2tuple(a):
    return tuple((list2tuple(x) if isinstance(x, list) else x for x in a))
def tuple2list(a):
    return list((tuple2list(x) if isinstance(x, tuple) else x for x in a))

Make a pool of coroutines - this is a hack of a pool, but it works:

# Make Coroutine Pools
colist2tuple_pool = [colist2tuple() for i in xrange(20) ]
cotuple2list_pool = [cotuple2list() for i in xrange(20) ]

Now do some timing - comparing to :

def make_test(m, n):
    # Test data function taken from HYRY's post!
    return [[range(m), make_test(m, n-1)] for i in range(n)]
import timeit
t = make_test(20, 8)
%timeit list2tuple(t)
%timeit colist2tuple_pool[0].send((t, colist2tuple_pool[1:]))

Results - notice the 'u' next to the 's' in the second line :-)

1 loops, best of 3: 1.32 s per loop
1 loops, best of 3: 4.05 us per loop

Really seems too fast to believe. Anybody know if timeit works with coroutines? Here is the old fashioned way:

tic = time.time()
t1 = colist2tuple_pool[0].send((t, colist2tuple_pool[1:]))
toc = time.time()
print toc - tic



Newer versions of Ipython and %timit give a warning:

The slowest run took 9.04 times longer than the fastest. This could
mean that an intermediate result is being cached 1000000 loops, best of 3: 317 ns per loop

After some further investigation, python generators are not magic and send is still a function call. The reason my generator based method appeared to be faster is that I was doing an inplace operation on the lists - which resulted in fewer function calls.

I wrote all this out with lots of additional detail in a recent talk.

Hope this helps someone looking to play with generators.