mongolol mongolol - 1 month ago 11
Scala Question

Removing Blank Strings from a Spark Dataframe

Attempting to remove rows in which a Spark dataframe column contains blank strings. Originally did

val df2 = df1.na.drop()
but it turns out many of these values are being encoded as
""
.

I'm stuck using Spark 1.3.1 and also cannot rely on DSL. (Importing spark.implicit_ isn't working.)

Answer

Removing things from a dataframe requires filter().

newDF = oldDF.filter("colName != ''")

or am I misunderstanding your question?