jjepsuomi jjepsuomi - 1 year ago 52
Python Question

Random sampling without replacement when more needs to be sampled than there are samples

I need to generate samples from a list of numbers in a scenario where I might have the situation that I need to sample more numbers than I have. More explicitly, this is what I need to do:

  • Let the total number of elements in my list be N.

  • I need to sample randomly without replacement from this list M samples.

  • If M <= N, then simply use Numpy's random.choice without replacement.

  • If M > N, then the samples must consist X times all the N numbers in the list, where X is the number of times N fully divides M, i.e. X = floor(M/N) and then sample additional M-(X*N) remainder samples from the list without replacement.

For example, let my list be the following:

L = [1, 2, 3, 4, 5]

and I need to sample 8 samples. Then firstly, I sample the full list once and additional 3 elements randomly without replacement, e.g. my samples could then be:

Sampled_list = [1, 2, 3, 4, 5, 3, 5, 1]

How can I implement such a code as efficiently as possible in terms of computation time in Python? Can this be done without for-loops?

At the moment I'm implementing this using for-loops but this is too inefficient for my purposes. I have also tried Numpy's random.choice without replacement but then I need to have M <= N.

Thank you for any help!

Answer Source

I would just wrap numpy's random.choice() like so:

L = [1, 2, 3, 4, 5]

def wrap_choice(list_to_sample, no_samples):
    list_size = len(list_to_sample)
    takes = no_samples // list_size
    samples = list_to_sample * (no_samples // list_size) + list(np.random.choice(list_to_sample, no_samples - takes * list_size))
    return samples

print(wrap_choice(L, 2))   # [5, 1]
print(wrap_choice(L, 13))  # [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 3, 3, 1]

Edit: There is no need to check for the length. The algorithm you have for when the requests are more than the list's length also works when this is not the case.