kamalbanga kamalbanga - 7 days ago 10
Python Question

Write data to Redis from PySpark

In Scala, we would write an RDD to Redis like this:

datardd.foreachPartition(iter => {
val r = new RedisClient("hosturl", 6379)
iter.foreach(i => {
val (str, it) = i
val map = it.toMap
r.hmset(str, map)
})
})


I tried doing this in PySpark like this:
datardd.foreachPartition(storeToRedis)
, where function
storeToRedis
is defined as:

def storeToRedis(x):
r = redis.StrictRedis(host = 'hosturl', port = 6379)
for i in x:
r.set(i[0], dict(i[1]))


It gives me this:


ImportError: ('No module named redis', function subimport at
0x47879b0, ('redis',))


Of course, I have imported redis.

Answer

PySpark's SparkContext has a addPyFile method specifically for this thing. Make the redis module a zip file (like this) and just call this method:

sc = SparkContext(appName = "analyze")
sc.addPyFile("/path/to/redis.zip")