pypep278 pypep278 - 1 month ago 13
Python Question

Removing objects from redis based on pattern match

I am using Redis as a data store/cache for my application. I am pushing data to the Redis instance after pickling it into a string. My data is a Python Class object (ie, key-value pairs, but pickled into a string). I am using the Redis lib in Python.

My data gets pushed periodically, and it is possible that data from a certain host can stop getting pushed due to the host going down, etc. I want to be able to purge the data from that host once the host goes down. I have a trigger in place that notifies my app about the host going down, etc.

However, I am unsure as to how to purge data from Redis in an efficient way by un-pickling the data and checking for a certain key-value pair in the data. I would like to do this in place if possible. Any help with this will be truly appreciated!

EDIT:

This is what I use to push data to redis:

self.redis.zadd("mymsgs", pickle.dumps(msg), int(time.time()+360))


The message itself is off the format:

{'hostname': 'abc1', 'version': 'foo', 'uptime': 'bar'}

Answer

If I understood correctly, what I would recommend (if possible, of course) is that you change a bit the format of the keys. Instead of using a generic mymsgs as a key, I would recommend adding somehow the hostname to the key itself. For instance, it could be mysgs_from_HOSTNAME.

Since you can use wildcards to fetch keys, when you wanna get all the messages, you could just list the keys matching mysgs_from_* and then get the values of those keys. That way, when you know that the hostname called HOSTNAME is down, you could quickly purge all its entries by doing a delete("mysgs_from_HOSTNAME")`

See this example:

import redis
import time
import pickle

redis_connection = redis.Redis(host='localhost', port=6379, db=0)

# This "for" loop is just a simple populator, to put a bunch of key/values in Redis
for hostname in ['abc1', 'foo2', 'foo3']:
    msg = {'hostname': hostname, 'version': 'foo', 'uptime': 'bar'}

    # Step 1, store the data using a key that contains the hostname:
    redis_key = "messages_from_host_%s" % hostname
    redis_connection.zadd(redis_key, pickle.dumps(msg), int(time.time() + 360))

# Ok... I have some sample data in Redis now...
# Shall we begin?...

# Let's say I wanna get all the messages from all the hosts:
# First, I find all the keys that can contain messages from hosts
matching_keys = redis_connection.keys("messages_from_host_*")
print "Got these keys that match what I wanna get: %s" % matching_keys
# Then I iterate through the keys and get the actual zrange (~value) of each 
print "These are the messages from all those hosts:"
for matching_key in matching_keys:
    messages = [pickle.loads(s) for s in redis_connection.zrange(matching_key, 0, -1)]
    print messages

# Let's say that now, I discover that host called `foo2` is down, and I want
# to remove all its information:
redis_connection.delete("messages_from_host_foo2")

# All the entries referred to the host `foo2` should be gone:
print "Now, I shouldn't bee seing information from `foo2`"
matching_keys = redis_connection.keys("messages_from_host_*")
for matching_key in matching_keys:
    messages = [pickle.loads(s) for s in redis_connection.zrange(matching_key, 0, -1)]
    print messages

Which outputs:

Got these keys that match what I wanna get: ['messages_from_host_foo2', 'messages_from_host_foo3', 'messages_from_host_abc1']
These are the messages from all those hosts:
[{'uptime': 'bar', 'hostname': 'foo2', 'version': 'foo'}]
[{'uptime': 'bar', 'hostname': 'foo3', 'version': 'foo'}]
[{'uptime': 'bar', 'hostname': 'abc1', 'version': 'foo'}]
Now, I shouldn't bee seing information from `foo2`
[{'uptime': 'bar', 'hostname': 'foo3', 'version': 'foo'}]
[{'uptime': 'bar', 'hostname': 'abc1', 'version': 'foo'}]
Comments