javadba javadba - 3 months ago 25
Python Question

Show partitions on a pyspark RDD

The pyspark RDD documentation


http://spark.apache.org/docs/1.2.1/api/python/pyspark.html#pyspark.RDD


does not show any method(s) to display partition information for an RDD.

Is there any way to get that information without executing an additional step e.g.:

myrdd.mapPartitions(lambda x: iter[1]).sum()


The above does work .. but seems like extra effort.

Answer

I missed it: very simple:

rdd.getNumPartitions()

Not used to the java-ish getFooMethod() anymore ;)

Comments