Stephane Stephane - 1 month ago 21
Scala Question

Apache Spark: Hold Custom class in GraphX: Not Serializable?

I have an article class

case class Article(articleName:String,
id:Option[Long],
authors: Iterator[Author],
keywords: Iterator[String])


(Author is a class that holds 4 options of strings)

and I want to create a graph out of it. I created an RDD of vertices and an RDD of edges

val vertices: RDD[(VertexId, Article)] = articles.map(article => (article.id.get , article))


when I create my graph:

val graph = Graph(vertices, edges)


I get the following error (shortened):

java.io.NotSerializableException: scala.collection.LinearSeqLike$$anon$1
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)


I'm not sure I understand why I can't do what I am doing?

lmm lmm
Answer

Your class is not serializable because a LinearSeqLike in it is not serializable. Tasks that run on spark cluster nodes have to be serialized to get there. I'd suggest using List or some other such concrete, serializable sequence type rather than Iterator.

Comments