wvjgsuhp wvjgsuhp - 2 months ago 16
Python Question

list comprehension and map without lambda on long string

$ python -m timeit -s'tes = "987kkv45kk321"*100' 'a = [list(i) for i in tes.split("kk")]'
10000 loops, best of 3: 79.4 usec per loop

$ python -m timeit -s'tes = "987kkv45kk321"*100' 'b = list(map(list, tes.split("kk")))'
10000 loops, best of 3: 66.9 usec per loop

$ python -m timeit -s'tes = "987kkv45kk321"*10' 'a = [list(i) for i in tes.split("kk")]'
100000 loops, best of 3: 8.34 usec per loop

$ python -m timeit -s'tes = "987kkv45kk321"*10' 'b = list(map(list, tes.split("kk")))'
100000 loops, best of 3: 7.38 usec per loop

$ python -m timeit -s'tes = "987kkv45kk321"' 'a = [list(i) for i in tes.split("kk")]'
1000000 loops, best of 3: 1.51 usec per loop

$ python -m timeit -s'tes = "987kkv45kk321"' 'b = list(map(list, tes.split("kk")))'
1000000 loops, best of 3: 1.63 usec per loop


I tried using timeit and wonder why creating list of lists from string.split() with list comprehension is faster for a shorter string but slower for longer string.

Answer

The fixed setup costs for map are higher than the setup costs for the listcomp solution. But the per-item costs for map are lower. So for short inputs, map is paying more in fixed setup costs than it saves on the per item costs (because there are so few items). When the number of items increases, the fixed setup costs for map don't change, but the savings per item is being reaped for more items, so map slowly pulls ahead.

Things that map saves on:

  1. Only looks up list once (the listcomp has to look it up in the builtin namespace every single loop, after checking the nested and global scopes first, because it can't guarantee list isn't overridden from loop to loop)
  2. Executes no Python bytecode per item (because the mapping function is also C level), so the interpreter doesn't get involved at all, reducing the amount of hot C level code

map loses on the actual call to map (C built-in functions are fast to run, but comparatively slow to call, especially if they take variable length arguments), and the creation and cleanup of the map object (the listcomp closure is compiled up front). But as I noted above, neither of these is tied to the size of the inputs, so you make up for it rapidly if the mapping function is a C builtin.