Kim Kim - 1 month ago 20
R Question

R callback functions using sparklyr

I hope to use mapPartitions and reduce function of Spark (http://spark.apache.org/docs/latest/programming-guide.html), using sparklyr.

It is easy in pyspark, the only thing I need to use is a plain python code. I can simply add python functions as callback function. So easy.

For example, in pyspark, I can use those two functions as follows:

mapdata = self.rdd.mapPartitions(mycbfunc1(myparam1))
res = mapdata.reduce(mycbfunc2(myparam2))


However, it seems this is not possible in R, for example sparklyr library. I checked RSpark, but it seems it is another way of query/wrangling data in R, nothing else.

I would appreciate if someone let me know how to use those two functions in R, with R callback functions.

Answer

In SparkR you could use internal functions - hence the prefix SparkR::: - to accomplish the same.

newRdd = SparkR:::toRDD(self)                  
mapdata = SparkR:::mapPartitions(newRdd, function(x) { mycbfunc1(x, myparam1)})
res = SparkR:::reduce(mapdata, function(x) { mycbfunc2(x, myparam2)})

I believe sparklyr interfaces only with the DataFrame / DataSet API.

Comments