The numbers show the Spark stage that is running, the number of completed, in-progress, and total tasks in the stage. (See What do the numbers on the progress bar mean in spark-shell? for more on the progress bar.)
Spark stages run tasks in parallel. In your case 5 tasks are running in parallel at the moment. If each task takes roughly the same time, this should give you an idea of how much longer you have to wait for this stage to finish.
RDD.take can take more than one stage.
take(1) will first get the first element of the first partition. If the first partition is empty, it will take the first elements from the second, third, fourth, and fifth partitions. The number of partitions it looks at in each stage is 4× the number of partitions already checked. So if you have a whole lot of empty partitions,
take(1) can take many iterations. This can be the case for example if you have a large amount of data, then do
filter(_.name == "John").take(1).
If you know your result will be small, you can save time by using
collect instead of
take(1). This will always gather all the data in a single stage. The main advantage is that in this case all the partitions will be processed in parallel, instead of the somewhat sequential manner of