rtcode rtcode - 1 year ago 122
Scala Question

Spark (Scala): How to turn an Array[Row] into either a DataSet[Row] or a DataFrame?

Very simple, I have an Array[Row] and I want to turn it into either a Dataset[Row] or DataFrame.

How did I come up with an Array of Rows?

Well, I was trying to clear nulls from my dataset:

  • without having to filter EACH column (I have a lot) and..

  • without using the .na.drop() function from DataFrameNaFunctions because it fails to detect when a cell actually has the string "null".

So, I came up with the following line to filter out null in all columns.

val outDF = inputDF.columns.flatMap { col => inputDF.filter(col + "!='' AND " + col + "!='null'").collect() }

Problem is, outDF is an Array[Row], hence the question! Any ideas welcome!

Answer Source

I'm posting the answer as per my comment.

df.na.drop(df.columns).where("'null' not in ("+df.columns.mkString(",")+")")
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download