Benjamin Benjamin - 1 year ago 99
Scala Question

Value join is not a member of org.apache.spark.rdd.RDD[(Long, T)]

This function seems valid for my IDE:

def zip[T, U](rdd1:RDD[T], rdd2:RDD[U]) : RDD[(T,U)] = {

But when I compile, I get :

value join is not a member of
org.apache.spark.rdd.RDD[(Long, T)] possible cause: maybe a semicolon
is missing before `value join'?

I am in Spark 1.6, I have already tried to import org.apache.spark.rdd.RDD._
and the code inside the function works well when it is directly used on two RDDs outside of a function definition.

Any idea ?

Answer Source

If you change the signature:

def zip[T, U](rdd1:RDD[T], rdd2:RDD[U]) : RDD[(T,U)] = {


def zip[T : ClassTag, U: ClassTag](rdd1:RDD[T], rdd2:RDD[U]) : RDD[(T,U)] = {

This will compile.

Why? join is a method of PairRDDFunctions (your RDD is implicitly converted into that class), which has the following signature:

class PairRDDFunctions[K, V](self: RDD[(K, V)])
  (implicit kt: ClassTag[K], vt: ClassTag[V], ord: Ordering[K] = null)

This means its constructor expects implicit values of types ClassTag[T] and ClassTag[U], as these will be used as the value types (the V in the PairRDDFunctions definition). Your method has no knowledge of what T and U are, and therefore cannot provide matching implicit values. This means the implicit conversion into PairRDDFunctions "fails" (compiler doesn't perform the conversion) and therefore the method join can't be found.

Adding [K : ClassTag] is shorthand for adding an implicit argument implicit kt: ClassTag[K] to the method, which is then used by compiler and passed to the constructor of PairRDDFunctions.

For more about ClassTags and what they're good for see this good article.