akonsu - 1 year ago 89

Python Question

I am trying to write an algorithm that would pick N distinct items from an sequence at random, without knowing the size of the sequence in advance, and where it is expensive to iterate over the sequence more than once. For example, the elements of the sequence might be the lines of a huge file.

I have found a solution when N=1 (that is, when trying to pick exactly one element at random from a huge sequence):

`import random`

items = range(1, 10) # Imagine this is a huge sequence of unknown length

count = 1

selected = None

for item in items:

if random.random() * count < 1:

selected = item

count += 1

But how can I achieve the same thing for other values of N (say, N=3)?

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

Use reservoir sampling. It's a very simple algorithm that works for any `N`

.

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**