Knows Not Much Knows Not Much - 1 year ago 88
Scala Question

join CassandraTableScanRDD[CassandraRow] with RDD[String]

I am writing a program where I have a RDD[String] and a CassandraTableScanRDD and i want to do a left join between them.

Is this possible? From what I saw online that joins were only happing between CassandraTableScanRDD.

Answer Source

join functions are available for PairRDD objects (see here).

A PairRDD object is an RDD of key-value pairs, for example: RDD[(Int, String)]

Typically you create a PairRDD object from a regular RDD using the keyBy function, which allows you to specify which key to use. Then when you run join, it joins elements whose keys are equal.