I am reading Spark: RDD operations and I am executing:
In : lines = sc.textFile("data")
In : lines.getNumPartitions()
In : lineLengths = lines.map(lambda s: len(s))
In : lineLengths.getNumPartitions()
In : len(lineLengths.collect())
lineLengths.count() tells you the number of rows in your rdd.
lineLengths.getNumPartitions(), as you noted, is the number of partitions your rdd is distributed over. Each partition of the rdd contains many rows of the dataframe.