Greg Peckory - 1 year ago 78
Java Question

# Double Hashing efficiency with word dictionary

Simply put I have a dictionary of words and I'm adding them to a hash table.

I am using Double Hashing (not the conventional method) and the following is yielding the best result.

``````    public static int getHashKey(String word) {

int index = 0;

for(int i = 0; i<word.length(); i++){

index += Math.pow(4,  i)*((int)word.charAt(i));
index = index % size;
}
return index;
}

public static int getDoubleHashKey(String word) {

int jump = 1;

for(int i = 0; i<word.length(); i++){

jump = jump * word.charAt(i);
jump = jump % size;
}
return jump;

}
``````

This is giving me 127,000 collisions. I also have a 2 fold prime hash table size and it cannot be changed.

Is there any way the Double Hashing algorithm can be improved? (Either of the 2 methods above).

I know it depends on what we are storing in the hash table etc. but is there any intuitive method or some tips that apply more generally so I can avoid a couple more collisions.

I ran a little Scala program on a dictionary of about 336 531 entries. There are significantly less collisions for the version 2 (118 142) than for the version 1 (305 431). Notice that the version 2 is close to an optimal number of collisions because 118 142 + 216 555 = 334 697, so 334 697/336 531 = 99,46% of values used in the 0-216555 range. Using the modulo outside the loop does improve your hash method.

``````import scala.io.Source

object Hash extends App {
val size = 216555
def doubleHashKey1(word: String) = {
var jump = 1;
for (ch <- word) {
jump = jump * ch;
jump = jump % size;
}
jump
}

def doubleHashKey2(word: String) = {
var jump = 1;
for (ch <- word) jump = jump * ch;
jump % size;
}

def countCollisions(words: Set[String], hashFun: String => Int) = words.size - words.map(hashFun).size