user3776105 user3776105 - 2 months ago 6
Scala Question

Adding/selecting fields to/from RDD

I've an RDD lets say

dataRdd
with fields like
timestamp
,
url
, ...

I want to create a new RDD with few fields from this
dataRdd
.

Following code segment creates the new RDD, where
timestamp
and
URL
are considered values and not field/column names:

var fewfieldsRDD= dataRdd.map(r=> ( "timestamp" -> r.timestamp , "URL" -> r.url))


However, with below code segment,
one
,
two
,
three
,
arrival
, and
SFO
are considered as column names.:

val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran")
val numairRdd= sc.makeRDD(Seq(numbers, airports))


Can anyone tell me what am I doing wrong and how can I create a new Rdd with field names mapped to values from another Rdd?

Answer

You are creating an RDD of tuples, not Map objects. Try:

var fewfieldsRDD= dataRdd.map(r=> Map( "timestamp" -> r.timestamp , "URL" ->   r.url))
Comments