Peter Peter - 8 months ago 53
Python Question

Why do -1 and -2 both hash to -2 in CPython?

Possible Duplicate:

When is a python object's hash computed and why is the hash of -1 different?

Why do
both hash to the same number if Python?

Since they do, how does Python tell these two numbers apart?

>>> -1 is -2
>>> hash(-1) is hash(-2)
>>> hash(-1)
>>> hash(-2)

Answer Source

Update - i think this now a wiki, please feel free to add more info.

-1 is a reserved value at the C level (of CPython - see comments from DSM about this being "as expected" in ironpython and pypy) .

See this Quora answer:

If you write a type in a C extension module and provide a tp_hash method, you have to avoid -1 -- if you return -1, Python will assume you meant to throw an error.

If you write a class in pure Python and provide a __hash__ method, there's no such requirement, thankfully. But that's because the C code that invokes your __hash__ method does that for you -- if your __hash__ returns -1, then hash() applied to your object will actually return -2.

which really just repackages info from the effbot:

The hash value -1 is reserved (it’s used to flag errors in the C implementation). If the hash algorithm generates this value, we simply use -2 instead.

Also, from agf in the comments (where people are asking for more info), the source.

Since they do, how does Python tell these two numbers apart?

Since all hash functions map a large input space to a smaller input space, collisions are always expected, no matter how good the hash function is. Think of hashing strings, for example. If hash codes are 32-bit integers, you have 2^32 (a little more than 4 billion) hash codes. If you consider all ASCII strings of length 6, you have (2^7)^6 (just under 4.4 trillion) different items in your input space. With only this set, you are guaranteed to have many, many collisions no matter how good you are. Add Unicode characters and strings of unlimited length to that!

Therefore, the hash code only hints at the location of an object, an equality test follows to test candidate keys. To implement a membership test in a hash-table set, the hash code gives you "bucket" number in which to search for the value. However, all set items with the same hash code are in the bucket. For this, you also need an equality test to distinguish between all candidates in the bucket.

This hash code and equality duality is hinted at in the CPython documentation on hashable objects. In other languages/frameworks, there is a guideline/rule that if you provide a custom hash code function, you must also provide a custom equality test (performed on the same fields as the hash code function).

Indeed, the Python release today address exactly this, with a security patch that addresses the efficiency issue when this (identical hash values, but on a massive scale) is used as a denial of service attack -