alec_djinn alec_djinn - 15 days ago 5
Python Question

Is there anything faster than dict()?

I need a faster way to store and access around 3GB of

k:v
pairs. Where
k
is a
string
or an
integer
and
v
is an
np.array()
that can be of different shapes.
Is there any object, that is faster than the standard python dict in storing and accessing such a table? For example,
a pandas.DataFrame
?

As far I have understood python dict is a quite fast implementation of a hashtable, is there anything better than that for my specific case?

Answer

No there is nothing faster than a dictionary, while the complexity of its indexing and even membership checking is O(1).

So once you saved your items in a dictionary you can access them very fast. Therefore it seems that the problem is not indexing. But you might be able to make the process slightly faster by doing some changes in your materials. For example, if your strings (keys) are not very large you can intern them, in order to be cashed in memory rather than being created as an object. If the keys in a dictionary are interned, and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer compare instead of a string compare. That makes the access to object very faster. Python has provided an intern() function within sys module that you can use it for this aim.

Enter string in the table of “interned” strings and return the interned string – which is string itself or a copy. Interning strings is useful to gain a little performance on dictionary lookup...

Here is an example:

In [49]: d = {'mystr{}'.format(i): i for i in range(30)}

In [50]: %timeit d['mystr25']
10000000 loops, best of 3: 46.9 ns per loop

In [51]: d = {sys.intern('mystr{}'.format(i)): i for i in range(30)}

In [52]: %timeit d['mystr25']
10000000 loops, best of 3: 38.8 ns per loop
Comments