Benjamin Benjamin - 1 year ago 138
Scala Question

Value join is not a member of org.apache.spark.rdd.RDD[(Long, T)]

This function seems valid for my IDE:

def zip[T, U](rdd1:RDD[T], rdd2:RDD[U]) : RDD[(T,U)] = {

But when I compile, I get :

value join is not a member of
org.apache.spark.rdd.RDD[(Long, T)] possible cause: maybe a semicolon
is missing before `value join'?

I am in Spark 1.6, I have already tried to import org.apache.spark.rdd.RDD._
and the code inside the function works well when it is directly used on two RDDs outside of a function definition.

Any idea ?

Answer Source

If you change the signature:

def zip[T, U](rdd1:RDD[T], rdd2:RDD[U]) : RDD[(T,U)] = {


def zip[T : ClassTag, U: ClassTag](rdd1:RDD[T], rdd2:RDD[U]) : RDD[(T,U)] = {

This will compile.

Why? join is a method of PairRDDFunctions (your RDD is implicitly converted into that class), which has the following signature:

class PairRDDFunctions[K, V](self: RDD[(K, V)])
  (implicit kt: ClassTag[K], vt: ClassTag[V], ord: Ordering[K] = null)

This means its constructor expects implicit values of types ClassTag[T] and ClassTag[U], as these will be used as the value types (the V in the PairRDDFunctions definition). Your method has no knowledge of what T and U are, and therefore cannot provide matching implicit values. This means the implicit conversion into PairRDDFunctions "fails" (compiler doesn't perform the conversion) and therefore the method join can't be found.

Adding [K : ClassTag] is shorthand for adding an implicit argument implicit kt: ClassTag[K] to the method, which is then used by compiler and passed to the constructor of PairRDDFunctions.

For more about ClassTags and what they're good for see this good article.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download