ta. ta. - 14 days ago 6
Scala Question

How to convert an RDD of Maps to dataframe

I have RDD of Map and i want to converted it to dataframe
Here is the input format of RDD

val mapRDD: RDD[Map[String, String]] = sc.parallelize(Seq(
Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"),
Map("empid" -> "13", "empName" -> "Ross", "depId" -> "201"),
Map("empid" -> "14", "empName" -> "Richard", "depId" -> "401"),
Map("empid" -> "15", "empName" -> "Michale", "depId" -> "501"),
Map("empid" -> "16", "empName" -> "John", "depId" -> "701")))


is there any way to convert into dataframe like

val df=mapRDD.toDf


df.show

empid, empName, depId
12 Rohan 201
13 Ross 201
14 Richard 401
15 Michale 501
16 John 701

Answer

You can easily convert it into Spark DataFrame:

Here is a code that would do the trick :

val mapRDD= sc.parallelize(Seq(
   Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"),
   Map("empid" -> "13", "empName" -> "Ross", "depId" -> "201"),
   Map("empid" -> "14", "empName" -> "Richard", "depId" -> "401"),
   Map("empid" -> "15", "empName" -> "Michale", "depId" -> "501"),
   Map("empid" -> "16", "empName" -> "John", "depId" -> "701")))

val columns=mapRDD.take(1).flatMap(a=>a.keys)

val resultantDF=mapRDD.map{value=>
      val list=value.values.toList
      (list(0),list(1),list(2))
      }.toDF(columns:_*)

resultantDF.show()

The output is :

+-----+-------+-----+
|empid|empName|depId|
+-----+-------+-----+
|   12|  Rohan|  201|
|   13|   Ross|  201|
|   14|Richard|  401|
|   15|Michale|  501|
|   16|   John|  701|
+-----+-------+-----+