senthil kumar p senthil kumar p - 8 months ago 42
Scala Question

How to update column based on a condition (a value in a group)?

I have one df

| 1| fn| red|
| 2| fn| blue|
| 3| fn|green|

If any of the color column value is red then I should update all columns values as red

like below

| 1| fn| red|
| 2| fn| red|
| 3| fn| red|

I could not figure it out. Any help please
I have tired following code

val gp=jdbcDF.filter($"dept".contains("fn"))
val row1=row.getAs[String](1)
var row2=row.getAs[String](2)
val make=if(row1 =="fn") row2="red"

Answer Source


val df = Seq(
  (1, "fn", "red"),
  (2, "fn", "blue"),
  (3, "fn", "green"),
  (4, "aa", "blue"),
  (5, "aa", "green"),
  (6, "bb", "red"),
  (7, "bb", "red"),
  (8, "aa", "blue")
).toDF("id", "fn", "color")

Do the calculation:

val redOrNot = df.groupBy("fn")
  .agg(collect_set('color) as "values")
  .withColumn("hasRed", array_contains('values, "red"))

val result = df.join(redOrNot, "fn")
  .withColumn("resultColor", when('hasRed, "red")) // gives null for no option
  .withColumn("color", coalesce('resultColor, 'color)) // skips nulls and gives the answer
  .select('id, 'fn, 'color)

The result looks as follows (that seems to be an answer):

| id| fn|color|
|  1| fn|  red|
|  2| fn|  red|
|  3| fn|  red|
|  4| aa| blue|
|  5| aa|green|
|  6| bb|  red|
|  7| bb|  red|
|  8| aa| blue|

You can chain when operators and have a default value with otherwise. Consult the scaladoc of when operator.

I think you could do something very similar (and perhaps more efficient) using windowed operators or user-defined aggregate functions (UDAF), but...well...don't currently know how to do it. Leaving the comment here to inspire others ;-)

p.s. Learnt a lot! Thanks for the idea!