Paradoxis Paradoxis - 4 months ago 7
Python Question

Is it possible to combine a dictionary of lists together into one list?

Say I have a dictionary like so:

my_list = {
"foo": ["a", "b", "c"],
"bar": ["d", "e", "f"]
}


How could I combine all lists in this dictionary into one large list in one line of code (meaning there would not be a temporary variable)? I came up with the following solution, however it is not very elegant:

def combine_list_dictionary():
temp = []
for (key, value_list) in my_list:
temp += value_list
return temp

combine_list_dictionary() # ["a", "b", "c", "d", "e", "f"]


I don't mind that the keys are lost in the process.

Answer

don't use sum to join lists. There is a long discussion on the python ideas mailing list around why that is a bad idea (will get link later).

itertools.chain is a good solution, or if you rather go functional then

>>> my_list = {
...     "foo": ["a", "b", "c"],
...     "bar": ["d", "e", "f"]
... }
>>> import operator as op
>>> reduce(op.concat, my_list.values())
['a', 'b', 'c', 'd', 'e', 'f']
>>>

Here is a performance comparison between chain and reduce for both small and large dictionaries.

>>> import random
>>> dict_of_lists = {k: range(random.randint(0, k)) for k in range(0, random.randint(0, 9))}
>>> %timeit list(itertools.chain.from_iterable(my_list.values()))
The slowest run took 12.72 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 995 ns per loop
>>> %timeit reduce(op.concat, my_list.values())
The slowest run took 19.77 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 467 ns per loop

reduce is about twice as fast as itertools. That is true for larger structures.

>>> dict_of_lists = {k: range(random.randint(0, k)) for k in range(0, random.randint(0, 9999))}
>>> %timeit list(itertools.chain.from_iterable(my_list.values()))
The slowest run took 6.47 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1 ┬Ás per loop
>>> %timeit reduce(op.concat, my_list.values())
The slowest run took 13.68 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 425 ns per loop
Comments