Newbie Newbie - 2 months ago 18
Scala Question

How to convert a dataframe to RDD[String, String]?

How to convert a dataframe to RDD[String, String] ?

I have a data frame

df : [id : String, coutry :String, title: String]


How to do I convert it to RDD[String, String] where the first column would be key and the json string made of remaining columns would be value ?

key : id
value : {coutry: "US", title : "MK"}

Answer

You can not have a RDD[String, String]. RDD takes only 1 type parameter so what you want is RDD[(String, String)].

df.rdd
  .map(row => {
    val id = row.getString(0)
    val country = row.getString(1)
    val title = row.getString(2)

    val jsonString = s"{country: $country, title: $title}"

    (id, jsonString)
  })