Răzvan Panda Răzvan Panda - 3 months ago 15
Scala Question

How to create encoder for Option type constructor, e.g. Option[Int]?

Is it possible to use

Option[_]
member in a case class used with Dataset API? eg.
Option[Int]


I tried to find an example but could not find any yet. This can probably be done with with a custom encoder (mapping?) but I could not find an example for that yet.

This might be achievable using Frameless library: https://github.com/adelbertc/frameless but there should be an easy way to get it done with base Spark libraries.

Update

I am using:
"org.apache.spark" %% "spark-core" % "1.6.1"


And getting the following error when trying to use an Option[Int]:


Unable to find encoder for type stored in a Dataset. Primitive types
(Int, String, etc) and Product types (case classes) are supported by
importing sqlContext.implicits._ Support for serializing other types
will be added in future releases


Solution update

Since I was prototyping I was just declaring the case class inside the function before the conversion to the Dataset (in my case inside
object Main {
).

Option types worked just fine when I moved the case class outside of the Main function.

Answer

We only define implicits for a subset of the types we support in SQLImplicits. We should probably consider adding Option[T] for common T as the internal infrastructure does understand Option. You can workaround this by either creating a case class, using a Tuple or constructing the required implicit yourself (though this is using and internal API so may break in future releases).

implicit def optionalInt: org.apache.spark.sql.Encoder[Option[Int]] = org.apache.spark.sql.catalyst.encoders.ExpressionEncoder()

val ds = Seq(Some(1), None).toDS()