anthonybell - 15 days ago 5
Scala Question

# How to randomly sample p percent of users in user event stream

I am looking for an algorithm that fairly samples p percent of users from an infinite list of users.

A naive algorithm looks something like this:

``````val p = 0.03 // 3 percent of users

//This is naive.. what is a better way??
def userIdToRandomNumber(userId: Int): Float = userId.toString.hashCode % 1000)/1000.0

if (userIdToRandomNumber(user.userId) < p) {
processUser(user)
}
``````

There are issues with this code though (hashCode may favor shorter strings, modulo arithmatic is discritizing value so its not exactly p, etc.).

Was is the "more correct" way of finding a deterministic mapping of
`userId`
s to a random number for the function
`userIdToRandomNumber`
above?

Try the method(s) below instead of the `hashCode`. Even for short strings, the values of the characters as integers ensure that the sum goes over 100. Also, avoid the division, so you avoid rounding errors
``````  def inScope(s: String, p: Double) = modN(s, 100) < p * 100