Sam Sam - 3 months ago 550
Scala Question

Renaming Column names of a Data frame in spark scala

I am trying to convert all the Headers/ColumnNames of a DataFrame in Spark-scala. as of now i come up with following code which only replaces a single column name. Please help on this.

for( i <- 0 to origCols.length - 1){df.withColumnRenamed(df.columns(i),df.columns(i).toLowerCase);}

Answer

If structure is flat:

val df = Seq((1L, "a", "foo", 3.0)).toDF
df.printSchema
// root
//  |-- _1: long (nullable = false)
//  |-- _2: string (nullable = true)
//  |-- _3: string (nullable = true)
//  |-- _4: double (nullable = false)

the simplest thing you can do is to use toDF method:

val newNames = Seq("id", "x1", "x2", "x3")
val dfRenamed = df.toDF(newNames: _*)

dfRenamed.printSchema
// root
// |-- id: long (nullable = false)
// |-- x1: string (nullable = true)
// |-- x2: string (nullable = true)
// |-- x3: double (nullable = false)

If you want to rename individual columns you can use either select with alias:

df.select($"_1".alias("x1"))

which can be easily generalized to multiple columns:

val lookup = Map("_1" -> "foo", "_3" -> "bar")

df.select(df.columns.map(c => col(c).as(lookup.getOrElse(c, c))): _*)

or withColumnRenamed:

df.withColumnRenamed("_1", "x1")

which use with foldLeft to rename multiple columns:

lookup.foldLeft(df)((acc, ca) => acc.withColumnRenamed(ca._1, ca._2))