codeBarer codeBarer - 3 years ago 225
Scala Question

How can two columns in a SparkSQL dataframe be coalesced?

I have a Spark SQL dataframe that looks like this:

df.select("FirstName","F_Name","Dept").show()

FirstName|F_Name|Dept
---------------------
Alfred |null |c1
null |Jarvis|c2
Jeeves |null |c1


I want to be able to coalesce FirstName and F_Name so that I can have a table that looks like this:

Name |Dept
-----------
Alfred|c1
Jarvis|c2
Jeeves|c1


I tried using coalesce as such but didn't work:

df.select("coalesec(FirstName,F_Name) as Name","Dept").show()


Either PySpark or Scala way of doing this would greatly help.

Thanks a bunch.

Answer Source

The coalesce function is exactly what you are looking for

df.select(coalesce(df.col("FirstName"),df.col("F_Name")).alias("Name"), df.col("Dept")).show() 
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download