Anaadih.pradeep Anaadih.pradeep - 1 month ago 22
Scala Question

Error: type mismatch flatMap

I am new to spark programming and scala and i am not able to understand the difference between map and flatMap.
I tried below code as i was expecting both to work but got error.

scala> val b = List("1","2", "4", "5")
b: List[String] = List(1, 2, 4, 5)

scala> b.map(x => (x,1))
res2: List[(String, Int)] = List((1,1), (2,1), (4,1), (5,1))

scala> b.flatMap(x => (x,1))
<console>:28: error: type mismatch;
found : (String, Int)
required: scala.collection.GenTraversableOnce[?]
b.flatMap(x => (x,1))


As per my understanding flatmap make Rdd in to collection for String/Int Rdd.
I was thinking that in this case both should work without any error.Please let me know where i am making the mistake.

Thanks

Answer

You need to look at how the signatures defined these methods:

def map[U: ClassTag](f: T => U): RDD[U]

map takes a function from type T to type U and returns an RDD[U].

On the other hand, flatMap:

def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U]

Expects a function taking type T and to a TraversableOnce[U], which is a trait a Tuple2 doesn't implement, and returns an RDD[U]. Generally, you use flatMap when you want to flatten a collection of collections, i.e. if you had an RDD[List[List[Int]] and you want to produce a RDD[List[Int]] you can flatMap it using identity.

Comments