epic_last_song epic_last_song - 11 days ago 8
Python Question

Divide the values of two rdds in spark (python)

I have these two Key-value RDDs in spark:

rdd1 = [(u'Key1', 4), (u'Key2', 6), (u'Key3', 10)]
rdd2 = [(u'Key1', 4), (u'Key2', 3), (u'Key3', 2)]


And I looking the spark function to get the division of the values: (rdd3= (rdd1/rdd2))

In this case:

rdd3 = [(u'Key1', 1), (u'Key2', 2), (u'Key3', 5)]

Answer

You can use union() in combination with reduceByKey() :

rdd1.union(rdd2).reduceByKey(lambda x,y: x/y).sortByKey().collect()
#Out[4]: [(u'Key1', 1), (u'Key2', 2), (u'Key3', 5)]
Comments