anthonybell anthonybell - 1 year ago 71
Scala Question

How to randomly sample p percent of users in user event stream

I am looking for an algorithm that fairly samples p percent of users from an infinite list of users.

A naive algorithm looks something like this:

val p = 0.03 // 3 percent of users

//This is naive.. what is a better way??
def userIdToRandomNumber(userId: Int): Float = userId.toString.hashCode % 1000)/1000.0

if (userIdToRandomNumber(user.userId) < p) {

There are issues with this code though (hashCode may favor shorter strings, modulo arithmatic is discritizing value so its not exactly p, etc.).

Was is the "more correct" way of finding a deterministic mapping of
s to a random number for the function

Answer Source

Try the method(s) below instead of the hashCode. Even for short strings, the values of the characters as integers ensure that the sum goes over 100. Also, avoid the division, so you avoid rounding errors

  def inScope(s: String, p: Double) = modN(s, 100) < p * 100

  def modN(s: String, n: Int): Int = {
    var sum = 0
    for (c <- s) { sum += c }
    sum % n
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download