I am looking for an algorithm that fairly samples p percent of users from an infinite list of users.
A naive algorithm looks something like this:
val p = 0.03 // 3 percent of users
//This is naive.. what is a better way??
def userIdToRandomNumber(userId: Int): Float = userId.toString.hashCode % 1000)/1000.0
if (userIdToRandomNumber(user.userId) < p) {
processUser(user)
}
userId
userIdToRandomNumber
Try the method(s) below instead of the hashCode
. Even for short strings, the values of the characters as integers ensure that the sum goes over 100. Also, avoid the division, so you avoid rounding errors
def inScope(s: String, p: Double) = modN(s, 100) < p * 100
def modN(s: String, n: Int): Int = {
var sum = 0
for (c <- s) { sum += c }
sum % n
}