I have a list of lists (which can contain up to 90k elements)
[[1,2,3], [1,2,4], [1,2,3], [1,2,4], [1,2,5]]
Keep a map of already seen elements with the associated id.
from itertools import count from collections import defaultdict mapping = defaultdict(count().__next__) result =  for element in my_list: result.append(mapping[tuple(element)])
you could also use a list-comprehension:
result = [mapping[tuple(element)] for element in my_list]
lists aren't hashable so you have to convert them to a
tuple when storing them as keys of the mapping.
defaultdict will assign a default value when it cannot find a key. The default value is obtained by calling the function provided in the constructor.
In this case the
__next__ method of the
count() generator yields increasing numbers.
As a more portable alternative you could do:
from functools import partial mapping = defaultdict(partial(next, count()))
An alternative solution, as proposed in the comments, is to just use the index as unique id:
result = [my_list.index(el) for el in my_list]
This is imple however:
For comparison of the two solutions see:
In : from itertools import count ...: from collections import defaultdict In : def hashing(seq): ...: mapping = defaultdict(count().__next__) ...: return [mapping[tuple(el)] for el in seq] ...: In : def indexing(seq): ...: return [seq.index(i) for i in seq] ...: In : from random import randint In : seq = [[randint(1, 20), randint(1, 20), randint(1, 20)] for _ in range(90000)] In : %timeit hashing(seq) 10 loops, best of 3: 37.7 ms per loop In : %timeit indexing(seq) 1 loop, best of 3: 26 s per loop
Note how for a 90k element list the mapping solution takes less 40 milliseconds while the indexing solution takes 26 seconds.