javadba javadba - 1 year ago 210
Python Question

Show partitions on a pyspark RDD

The pyspark RDD documentation

does not show any method(s) to display partition information for an RDD.

Is there any way to get that information without executing an additional step e.g.:

myrdd.mapPartitions(lambda x: iter[1]).sum()

The above does work .. but seems like extra effort.

Answer Source

I missed it: very simple:


Not used to the java-ish getFooMethod() anymore ;)

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download