eLearner eLearner - 1 year ago 113
Python Question

Shuffling combinations without converting iterable (itertools.combinations) to list

The following simple code gives me the possible combinations of length 3 of 200 elements.

from itertools import combinations
comb = combinations( range(200), 3 )

I want to get the combinations in a random order in order to pick the first N combinations. However, if I convert comb to a list and shuffle it as following, I may get a memory error because the list might contain too many elements:

comb = list(comb) # This might be huge and give a memory error
N = 10
comb = comb[:10] # get only the first N random combinations

Is there any other way to get N random combinations ? (i.e., not in the order generated by itertools.combinations).

Answer Source

There are C(200, 3) = 1313400 possible combinations. As you also mentioned, this number can easily get out of hand due to the combinatorial explosion. For example, if you choose 4 instead of 3 elements, the number of combinations will be approximately 50 times larger (64684950). Instead of randomly selecting from these combinations, you can randomly build possible combinations.

To build those combinations, you can use random.sample from the random library. random.sample(range(200), 3) will randomly generate one of those 1313400 combinations. If you call it again, it will generate another combination.

There are two issues:

  1. Order is important in random.sample ([1, 2, 3] is different than [1, 3, 2]). In combinations, it is not. To solve that, you can use sorted().
  2. random.sample will independently generate the next 3 numbers. Therefore, the combinations generated in different iterations could be the same. Although it is very unlikely for this example (≈0.0000343), you can use a set to store the combinations so that only unique combinations will be stored.

The following will generate 10 different combinations:

import random
combs = set()
N = 10
while len(combs) < N:
    combs.add(tuple(sorted(random.sample(range(200), 3))))

About the random_combination function in itertools docs: The function starts by building a "pool". That pool consists of 1313400 different combinations. It then makes the selection out of those 1313400 combinations so it has no memory advantage over your approach.