Tong Tong - 1 year ago 119
Scala Question

Scala: How can I replace value in Dataframs using scala

For example I want to replace all numbers equal to 0.2 in a column to 0. How can I do that in Scala? Thanks


|year| make|model| comment |blank|
|2012|Tesla| S | No comment | |
|1997| Ford| E350|Go get one now th...| |
|2015|Chevy| Volt| null | null|

This is my Dataframe I'm trying to change Tesla in make column to S

Answer Source

Here is my take on this one:

 val rdd = sc.parallelize(
      List( (2012,"Tesla","S"), (1997,"Ford","E350"), (2015,"Chevy","Volt"))
  val sqlContext = new SQLContext(sc)

  // this is used to implicitly convert an RDD to a DataFrame.
  import sqlContext.implicits._

  val dataframe = rdd.toDF()

  dataframe.foreach(println) => {
    val row1 = row.getAs[String](1)
    val make = if (row1.toLowerCase == "tesla") "S" else row1


You can actually use directly map on the DataFrame.

So you basically check the column 1 for the String tesla. If it's tesla, use the value S for make else you the current value of column 1

Then build a tuple with all data from the row using the indexes (zero based) (Row(row(0),make,row(2))) in my example)

There is probably a better way to do it. I am not that familiar yet with the Spark umbrella

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download