numegil - 7 months ago 42

Python Question

I feel like I'm way overthinking this problem, but here goes anyway...

I have a hash table with M slots in its internal array. I need to insert N elements into the hash table. Assuming that I have a hash function that randomly inserts am element into a slot with equal probability for each slot, what's the expected value of the total number of hash collisions?

(Sorry that this is more of a math question than a programming question).

Edit:

Here's some code I have to simulate it using Python. I'm getting numerical answers, but having trouble generalizing it to a formula and explaining it.

`import random`

import pdb

N = 5

M = 8

NUM_ITER = 100000

def get_collisions(table):

col = 0

for item in table:

if item > 1:

col += (item-1)

return col

def run():

table = [0 for x in range(M)]

for i in range(N):

table[int(random.random() * M)] += 1

#print table

return get_collisions(table)

# Main

total = 0

for i in range(NUM_ITER):

total += run()

print float(total)/NUM_ITER

Answer

You'll find the answer here: Quora.com. The expected number of collisions for **m** buckets and **n** inserts is

`n - m * (1 - ((m-1)/m)^n)`

.