smeeb smeeb - 1 month ago 10
Scala Question

Creating a Spark DataFrame from a single string

I'm trying to take a hardcoded String and turn it into a 1-row Spark DataFrame (with a single column of type

StringType
) such that:

String fizz = "buzz"


Would result with a DataFrame whose
.show()
method looks like:

+-----+
| fizz|
+-----+
| buzz|
+-----+


My best attempt thus far has been:

val rawData = List("fizz")
val df = sqlContext.sparkContext.parallelize(Seq(rawData)).toDF()

df.show()


But I get the following compiler error:

java.lang.ClassCastException: org.apache.spark.sql.types.ArrayType cannot be cast to org.apache.spark.sql.types.StructType
at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:413)
at org.apache.spark.sql.SQLImplicits.rddToDataFrameHolder(SQLImplicits.scala:155)


Any ideas as to where I'm going awry? Also, how do I set
"buzz"
as the row value for the
fizz
column?




Update:



Trying:

sqlContext.sparkContext.parallelize(rawData).toDF()


I get a DF that looks like:

+----+
| _1|
+----+
|buzz|
+----+

Answer

Try:

sqlContext.sparkContext.parallelize(rawData).toDF()

In 2.0 you can:

import spark.implicits._

rawData.toDF

Optionally provide a sequence of names for toDF:

sqlContext.sparkContext.parallelize(rawData).toDF("fizz")
Comments