blue-sky blue-sky - 1 month ago 43
Scala Question

How to print the contents of RDD?

I'm attempting to print the contents of a collection to the Spark console.

I have a type:

linesWithSessionId: org.apache.spark.rdd.RDD[String] = FilteredRDD[3]


And I use the command:

scala> linesWithSessionId.map(line => println(line))


But this is printed :


res1: org.apache.spark.rdd.RDD[Unit] = MappedRDD[4] at map at :19


How can I write the RDD to console or save it to disk so I can view its contents?

Answer

The map function is a transformation, which means that Spark will not actually evaluate your RDD until you run an action on it.

To print it, you can use foreach (which is an action):

linesWithSessionId.foreach(println)

To write it to disk you can use one of the saveAs... functions (still actions) from the RDD API