Hassan Baig Hassan Baig - 1 month ago 12
Python Question

Benchmarking retrieval from redis vs memory in python (using timeit)

I have a list of numbers. This list is stored in two ways: either as an in-memory python object, or as a redis list (redis set up in the same server).

I'm comparing the time it takes to retrieve these two lists, using python's

timeit
. Here's what I do in the python shell:

import timeit
import redis
POOL = redis.ConnectionPool(host='127.0.0.1',port=6379,db=0)
my_server = redis.Redis(connection_pool=POOL)
print min(timeit.Timer('pylist1 = my_server.lrange("nums:5",0,-1)', setup='from __main__ import my_server').repeat(7,1000))


This gives me a time of
1.92341279984
.

Next, I time the in-memory python object like so:

pylist = my_server.lrange("nums:5",0,-1)
print min(timeit.Timer('pylist2 = pylist',setup='from __main__ import pylist').repeat(7,1000))


This gives me a time of
4.29153442383e-05
. I.e. it seems to be ~45K times faster than retrieving the same list from redis.

My question is this: is my comparison approach correct? I.e., am I accurately simulating retrieval from redis vs retrieval from memory? This is a huge performance boost for the use case I have in mind, but before I implement this, just want to be sure I didn't fudge the benchmarking.

Answer

In the comparison you've put up here, you're basically just measuring how long Python takes to bind a new name to a value in the second case. So it doesn't surprise me that this is vastly faster than communicating with a different process (Redis). I guess what surprises me is that you would consider getting a value from Redis if the option exists simply to keep it in memory.

So, you need to be more clear about why you are using Redis for this in the first place. It will always be slower than in-process memory, no benchmark needed for that. You need to ask "why am I not just using Python lists and dictionaries"? There are several valid answers: your data is too large to fit into memory, you require the cache-specific features like allowing values to disappear after a while, or you want to use it for IPC, or persistence. Once you know the answer here, that will inform the benchmarking you want to do. And the question will be more like "How do I obtain the benefits/features I have listed above for the least performance penalty". Redis may not be the only answer. You may consider shelf for persistence, or perhaps even a full-on relational database or Mongo or whatever.

In short, once you have a good idea of why, the how often solves itself.