anthonybell anthonybell - 15 days ago 5
Scala Question

How to randomly sample p percent of users in user event stream

I am looking for an algorithm that fairly samples p percent of users from an infinite list of users.

A naive algorithm looks something like this:

val p = 0.03 // 3 percent of users

//This is naive.. what is a better way??
def userIdToRandomNumber(userId: Int): Float = userId.toString.hashCode % 1000)/1000.0

if (userIdToRandomNumber(user.userId) < p) {
processUser(user)
}


There are issues with this code though (hashCode may favor shorter strings, modulo arithmatic is discritizing value so its not exactly p, etc.).

Was is the "more correct" way of finding a deterministic mapping of
userId
s to a random number for the function
userIdToRandomNumber
above?

Answer

Try the method(s) below instead of the hashCode. Even for short strings, the values of the characters as integers ensure that the sum goes over 100. Also, avoid the division, so you avoid rounding errors

  def inScope(s: String, p: Double) = modN(s, 100) < p * 100

  def modN(s: String, n: Int): Int = {
    var sum = 0
    for (c <- s) { sum += c }
    sum % n
  }
Comments