Will Will - 1 year ago 74
Python Question

Why make lists unhashable?

A common issue on SO is removing duplicates from a list of lists. Since lists are unhashable,

set([[1, 2], [3, 4], [1, 2]])
TypeError: unhashable type: 'list'
. Answers to this kind of question usually involve using tuples, which are immutable and therefore hashable.

This answer to What makes lists unhashable? include the following:

If the hash value changes after it gets stored at a particular slot in the dictionary, it will lead to an inconsistent dictionary. For example, initially the list would have gotten stored at location A, which was determined based on the hash value. If the hash value changes, and if we look for the list we might not find it at location A, or as per the new hash value, we might find some other object.

but I don't quite understand because other types that can be used for dictionary keys can be changed without issue:

>>> d = {}
>>> a = 1234
>>> d[a] = 'foo'
>>> a += 1
>>> d[a] = 'bar'
>>> d
{1234: 'foo', 1235: 'bar'}

It is obvious that if the value of
changes, it will hash to a different location in the dictionary. Why is the same assumption dangerous for a list? Why is the following an unsafe method for hashing a list, since it is what we all use when we need to anyway?

>>> class my_list(list):
... def __hash__(self):
... return tuple(self).__hash__()
>>> a = my_list([1, 2])
>>> b = my_list([3, 4])
>>> c = my_list([1, 2])
>>> foo = [a, b, c]
>>> foo
[[1, 2], [3, 4], [1, 2]]
>>> set(foo)
set([[1, 2], [3, 4]])

It seems that this solves the
problem, why is this an issue? Lists may be mutable, but they are ordered which seems like it would be all that's needed for hashing.

Answer Source

You seem to confuse mutability with rebinding. a += 1 assigns a new object, the int object with the numeric value 1235, to a. Under the hood, for immutable objects like int, a += 1 is just the same as a = a + 1.

The original 1234 object is not mutated. The dictionary is still using an int object with numeric value 1234 as the key. The dictionary still holds a reference to that object, even though a now references a different object. The two references are independent.

Try this instead:

>>> class BadKey:
...     def __init__(self, value):
...         self.value = value
...     def __eq__(self, other):
...         return other == self.value
...     def __hash__(self):
...         return hash(self.value)
...     def __repr__(self):
...         return 'BadKey({!r})'.format(self.value)
>>> badkey = BadKey('foo')
>>> d = {badkey: 42}
>>> badkey.value = 'bar'
>>> print(d)
{BadKey('bar'): 42}

Note that I altered the attribute value on the badkey instance. I didn't even touch the dictionary. The dictionary reflects the; the actual key value itself was mutated, the object that both the name badkey and the dictionary reference.

However, you now can't access that key anymore:

>>> badkey in d
>>> BadKey('bar') in d
>>> for key in d:
...     print(key, key in d)
BadKey('bar') False

I have thoroughly broken my dictionary, because I can no longer reliably locate the key.

That's because BadKey violates the principles of hashability; that the hash value must remain stable. You can only do that if you don't change anything about the object that the hash is based on. And the hash must be based on whatever makes two instances equal.

For lists, the contents make two list objects equal. And you can change those, so you can't produce a stable hash either.