Sarathkumar Vulchi Sarathkumar Vulchi - 1 month ago 40
Scala Question

How to work with DataSet in Spark using scala?

I load my CSV using DataFrame then I converted to DataSet but it's shows like this

Multiple markers at this line:

- Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing

spark.implicits._ Support for serializing other types will be added in future releases.

- not enough arguments for method as: (implicit evidence$2:

org.apache.spark.sql.Encoder[DataSet.spark.aacsv])org.apache.spark.sql.Dataset[DataSet.spark.aacsv]. Unspecified value parameter evidence$2

How to resolve this?.
My code is -

case class aaCSV(
a: String,
b: String
)

object WorkShop {

def main(args: Array[String]) = {
val conf = new SparkConf()
.setAppName("readCSV")
.setMaster("local")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val customSchema = StructType(Array(
StructField("a", StringType, true),
StructField("b", StringType, true)))

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").schema(customSchema).load("/xx/vv/ss.csv")
df.printSchema()
df.show()
val googleDS = df.as[aaCSV]
googleDS.show()

}

}


Now I changed main function like this -

def main(args: Array[String]) = {
val conf = new SparkConf()
.setAppName("readCSV")
.setMaster("local")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._;
val sa = sqlContext.read.csv("/xx/vv/ss.csv").as[aaCSV]
sa.printSchema()
sa.show()
}


But it throws error - Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '
Adj_Close
' given input columns: [_c1, _c2, _c5, _c4, _c6, _c3, _c0]; line 1 pos 7
. What should i do ?

Answer

Do you have header (column names) in your csv files ? If yes, try adding .option("header","true") in the read statement. Example: sqlContext.read.option("header","true").csv("/xx/vv/ss.csv").as[aaCSV].

The below blog has different examples for Dataframes and Dataset:http://technippet.blogspot.in/2016/10/different-ways-of-creating.html