Sharan S Sharan S - 1 month ago 9
Scala Question

How to pass multiple parameters in partitionBy def in Scala?

I am currently using spark 2.0 and I am trying to write the dataframe as a parquet with multiple partitions.

I am trying to execute the below in spark shell.

var partitionNames = "partition1,partition2"

var partition = partitionNames.split(",").map(elem => "\""+ elem + "\"").map(elem => elem.mkString) //"partition1","partition2"

df.write.partitionBy(partition).path("s3://")


When I execute the above write command, it gives me error stating that the partition column does not exist in the dataframe.

If I hardcode the partitions, it works but when I pass as an argument it does not.

Answer Source

There are two issues here: The first that the column name contains " (which is probably not what you want), the second is that partitionBy expects varArgs strings.

In any case, assuming partition contains the correct names in its values you should be doing:

df.write.partitionBy(partition: _*).path("s3://")