Nick Johnson - 1 year ago 163

Python Question

Recently I needed to do weighted random selection of elements from a list, both with and without replacement. While there are well known and good algorithms for unweighted selection, and some for weighted selection without replacement (such as modifications of the resevoir algorithm), I couldn't find any good algorithms for weighted selection with replacement. I also wanted to avoid the resevoir method, as I was selecting a significant fraction of the list, which is small enough to hold in memory.

Does anyone have any suggestions on the best approach in this situation? I have my own solutions, but I'm hoping to find something more efficient, simpler, or both.

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

One of the fastest ways to make many with replacement samples from an unchanging list is the alias method. The core intuition is that we can create a set of equal-sized bins for the weighted list that can be indexed very efficiently through bit operations, to avoid a binary search. It will turn out that, done correctly, we will need to only store two items from the original list per bin, and thus can represent the split with a single percentage.

Let's us take the example of five equally weighted choices, `(a:1, b:1, c:1, d:1, e:1)`

To create the alias lookup:

Normalize the weights such that they sum to

`1.0`

.`(a:0.2 b:0.2 c:0.2 d:0.2 e:0.2)`

This is the probability of choosing each weight.Find the smallest power of 2 greater than or equal to the number of variables, and create this number of partitions,

`|p|`

. Each partition represents a probability mass of`1/|p|`

. In this case, we create`8`

partitions, each able to contain`0.125`

.Take the variable with the least remaining weight, and place as much of it's mass as possible in an empty partition. In this example, we see that

`a`

fills the first partition.`(p1{a|null,1.0},p2,p3,p4,p5,p6,p7,p8)`

with`(a:0.075, b:0.2 c:0.2 d:0.2 e:0.2)`

If the partition is not filled, take the variable with the most weight, and fill the partition with that variable.

Repeat steps 3 and 4, until none of the weight from the original partition need be assigned to the list.

For example, if we run another iteration of 3 and 4, we see

`(p1{a|null,1.0},p2{a|b,0.6},p3,p4,p5,p6,p7,p8)`

with `(a:0, b:0.15 c:0.2 d:0.2 e:0.2)`

left to be assigned

At runtime:

Get a

`U(0,1)`

random number, say binary`0.001100000`

bitshift it

`lg2(p)`

, finding the index partition. Thus, we shift it by`3`

, yielding`001.1`

, or position 1, and thus partition 2.If the partition is split, use the decimal portion of the shifted random number to decide the split. In this case, the value is

`0.5`

, and`0.5 < 0.6`

, so return`a`

.

Here is some code and another explanation, but unfortunately it doesn't use the bitshifting technique, nor have I actually verified it.

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**