Avijit Avijit - 1 month ago 20
Scala Question

Spark DataFrame Add Column with Value

I have a

DataFrame
with below data

scala> nonFinalExpDF.show
+---+----------+
| ID| DATE|
+---+----------+
| 1| null|
| 2|2016-10-25|
| 2|2016-10-26|
| 2|2016-09-28|
| 3|2016-11-10|
| 3|2016-10-12|
+---+----------+


From this
DataFrame
I want to get below
DataFrame


+---+----------+----------+
| ID| DATE| INDICATOR|
+---+----------+----------+
| 1| null| 1|
| 2|2016-10-25| 0|
| 2|2016-10-26| 1|
| 2|2016-09-28| 0|
| 3|2016-11-10| 1|
| 3|2016-10-12| 0|
+---+----------+----------+


Logic -


  1. For latest DATE(MAX Date) of an ID, Indicator value would be 1 and others
    are 0.

  2. For null value of the account Indicator would be 1



Please suggest me a simple logic to do that.

Answer

Try

df.createOrReplaceTempView("df")
spark.sql("""
  SELECT id, date,
    CAST(LEAD(COALESCE(date, TO_DATE('1900-01-01')), 1)
    OVER (PARTITION BY id ORDER BY date) IS NULL AS INT)
  FROM df""")