Tong Tong - 1 month ago 15
Scala Question

Scala: How can I replace value in Dataframs using scala

For example I want to replace all numbers equal to 0.2 in a column to 0. How can I do that in Scala? Thanks

Edit:

|year| make|model| comment |blank|
|2012|Tesla| S | No comment | |
|1997| Ford| E350|Go get one now th...| |
|2015|Chevy| Volt| null | null|


This is my Dataframe I'm trying to change Tesla in make column to S

Answer

Here is my take on this one:

 val rdd = sc.parallelize(
      List( (2012,"Tesla","S"), (1997,"Ford","E350"), (2015,"Chevy","Volt"))
  )
  val sqlContext = new SQLContext(sc)

  // this is used to implicitly convert an RDD to a DataFrame.
  import sqlContext.implicits._

  val dataframe = rdd.toDF()

  dataframe.foreach(println)

 dataframe.map(row => {
    val row1 = row.getAs[String](1)
    val make = if (row1.toLowerCase == "tesla") "S" else row1
    Row(row(0),make,row(2))
  }).collect().foreach(println)

//[2012,S,S]
//[1997,Ford,E350]
//[2015,Chevy,Volt]

You can actually use directly map on the DataFrame.

So you basically check the column 1 for the String tesla. If it's tesla, use the value S for make else you the current value of column 1

Then build a tuple with all data from the row using the indexes (zero based) (Row(row(0),make,row(2))) in my example)

There is probably a better way to do it. I am not that familiar yet with the Spark umbrella

Comments