owlsocks owlsocks - 7 months ago 101
Scala Question

How to convert a case-class-based RDD into a DataFrame?

The Spark documentation shows how to create a DataFrame from an RDD, using Scala case classes to infer a schema. I am trying to reproduce this concept using

sqlContext.createDataFrame(RDD, CaseClass)
, but my DataFrame ends up empty. Here's my Scala code:

// sc is the SparkContext, while sqlContext is the SQLContext.

// Define the case class and raw data
case class Dog(name: String)
val data = Array(

// Create an RDD from the raw data
val dogRDD = sc.parallelize(data)

// Print the RDD for debugging (this works, shows 2 dogs)

// Create a DataFrame from the RDD
val dogDF = sqlContext.createDataFrame(dogRDD, classOf[Dog])

// Print the DataFrame for debugging (this fails, shows 0 dogs)

The output I'm seeing is:


What am I missing?



All you need is just

val dogDF = sqlContext.createDataFrame(dogRDD)

Second parameter is part of Java API and expects you class follows java beans convention (getters/setters). Your case class doesn't follow this convention, so no property is detected, that leads to empty DataFrame with no columns.