Adam Pitt Adam Pitt - 1 month ago 10
Scala Question

Scala Spark 1.5.2 RDD[Either[A,B]]

The issue I am having is I seem to have tumled down a dark path by trying to use scala's

Either
within an RDD.

My application reads data using a spark context into
RDD[String]
.

This
RDD[String]
is then parsed into
Left[A]
or
Right[B]
(
Either[A,B]
) as I want unparsable records to remain so I can sink them elsewhere.

I have come to the point where I would like to treat A and B differently... therefore try to call

left: RDD[A] = (x:RDD[Either[A,B]]).map(_.left.get)


The issue here is that if x doesn't have any errors (left side) this will throw an exception. I can try and catch the exception but map will still need me to return an
RDD[A]
which can't be done with
sc.empty[RDD]
or
.getOrElse
.

If anyone has a working solution or could please correct me on best practise I'm guessing Eithers are not meant to be used in conjunction with RDD's.

Answer

Try

val left: RDD[A] = x.collect({case Left(x) => x})

I would put errors on the left and correctly parsed values on the right - that's how it's usually done. Scala 2.12 also has a right-leaning either, which formally encodes that convention.