Slim AZAIZ Slim AZAIZ - 11 months ago 101
Scala Question

why the number of partitions in sortByKey() is not equal by default to one?

When I execute :

list.sortByKey.take(10).foreach(println)


the result is not correct. However when I modify it to :

list.sortByKey(false,1).take(10).foreach(println)


I have a correct result

Answer Source

1)

  xxx.sortByKey().foreach(println)

Foreach runs in parallel across the partitions beacuse of that you will not get ordering. The order may be mixed.

2)

Following code is work for only 1 partitions and start breaking on cluster or more than 1 workers

 xxx.sortByKey(numPartitions=1).foreach(println)

3)

  xxx.sortByKey().collect

Collect gives array of the partitions concatenated in their sorted order.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download