Carl Carl - 3 months ago 60
Scala Question

Spark 2.0 Scala - RDD.toDF()

I am working with Spark 2.0 Scala. I am able to convert an RDD to a DataFrame using the toDF() method.

val rdd = sc.textFile("/pathtologfile/logfile.txt")
val df = rdd.toDF()


But for the life of me I cannot find where this is in the API docs. It is not under RDD. But it is under DataSet (link 1). However I have an RDD not a DataSet.

Also I can't see it under implicits (link 2).

So please help me understand why toDF() can be called for my RDD. Where is this method being inherited from?

Answer

It's coming from here:

Spark 2.0 API

Explanation: if you import sqlContext.implicits._, you have a method to convert RDD to DataSetHolder (rddToDataSetHolder), then you call toDF on the DataSetHolder

Comments