kamalbanga kamalbanga - 1 month ago 37
Python Question

Write data to Redis from PySpark

In Scala, we would write an RDD to Redis like this:

datardd.foreachPartition(iter => {
val r = new RedisClient("hosturl", 6379)
iter.foreach(i => {
val (str, it) = i
val map = it.toMap
r.hmset(str, map)

I tried doing this in PySpark like this:
, where function
is defined as:

def storeToRedis(x):
r = redis.StrictRedis(host = 'hosturl', port = 6379)
for i in x:
r.set(i[0], dict(i[1]))

It gives me this:

ImportError: ('No module named redis', function subimport at
0x47879b0, ('redis',))

Of course, I have imported redis.


PySpark's SparkContext has a addPyFile method specifically for this thing. Make the redis module a zip file (like this) and just call this method:

sc = SparkContext(appName = "analyze")