Carlos Bribiescas Carlos Bribiescas - 3 months ago 10
Scala Question

Why no encoder when mapping lines into Array[String]?

Spark is giving me a compile time error

Error:(49, 13) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing sqlContext.implicits._ Support for serializing other types will be added in future releases.
.map(line => line.split(delimiter))
^


For the following code

val digital2 = sqlContext.read.text("path").as[String]
.map(line => line.split(delimiter))
.map(lineSplit => {
new MyType(lineSplit(0), lineSplit(1), lineSplit(2), lineSplit(3)
, lineSplit(4).toInt, lineSplit(5).toInt, lineSplit(6).toInt, lineSplit(7).toInt
)
})


However this code works just fine

val digital = sqlContext.read.text("path").as[String]
.map(line => {
val lineSplit = line.split(delimiter)
new MyType(lineSplit(0), lineSplit(1), lineSplit(2), lineSplit(3)
, lineSplit(4).toInt, lineSplit(5).toInt, lineSplit(6).toInt, lineSplit(7).toInt
)
}


I'm not following what is going on. Can someone explain?

Answer

In the first example, .map(line => line.split(delimiter)) will returns Dataset[Array[String]]. Then it requires an encoder for Dataset[Array[String]]. However, such encoder was added in 1.6.1. So if you use old Spark versions, it cannot compile.

Comments