I am studying Ruby hash internals. Hash in Ruby are really efficient because (among others thing) in order to compare entries it just uses its hash value (for strings and numbers). I wondering how can it makes this assumption since the probability of computing the same hash value for two different keys is not zero.
Internally it uses the Murmur hash function.
As a reference: http://preshing.com/20110504/hash-collision-probabilities/
Can you share with us how you came to the conclusion that Ruby uses only the hash value to determine equality?
The text below is to explain to others your excellent point that the probability of computing the same hash value for two different keys is not zero, so how can the Hash class rely on just the hash value to determine equality?
For the purpose of this discussion I will refer to Ruby hashes as maps, so as not to confuse the 2 uses of the term hash in the Ruby language (1, a computed value on an object, and 2, a map/dictionary of pairs of values and unique keys).
As I understand it, hash values in maps, sets, etc. are used as a quick first pass at determining possible equality. That is, if the hashes of 2 objects are equal, then it is possible that the 2 objects are equal; but it's also possible that the 2 objects are not equal, but coincidentally produce the same hash value.
In other words, the only sure thing you can tell about equality from the hash values of the objects being compared is that if hash1 != hash2 then the objects are definitely not equal.
If the 2 hashes are equal, then the 2 objects must be compared by their content (in Ruby, by calling the
== method, I believe).
So comparing hashes is not a substitute for comparing the objects themselves, it's just a quick first pass used to optimize performance.