Yousuf Zaman Yousuf Zaman - 1 year ago 84
Scala Question

Spark Scala Dynamic column selection from DataFrame

I have a DataFrame which have different type of columns. Among those column, i need to retrieve specific column from that DataFrame.
Hard coded DataFrame select statement will be like this:

val logRegrDF = myDF.select(myDF("LEBEL_COLUMN").as("label"),
col("FEATURE_COL1"), col("FEATURE_COL2"), col("FEATURE_COL3"), col("FEATURE_COL4"))


Where LEBEL_COLUMN and FEATURE_COLs will be dynamic.
I have Array or Seq for those FEATURE Columns like this:

val FEATURE_COL_ARR = Array("FEATURE_COL1","FEATURE_COL2","FEATURE_COL3","FEATURE_COL4")


I need to use this Array of column collection with that SELECT statement in the 2nd part.
In the select, 1st column will be one (LABEL_COLUMN) and rest will be dynamic list.

Can you please help me to make the select statement working in SCALA.

Note:
The sample code given bellow is working, but i need to add column array in the 2nd part of the SELECT

val colNames = FEATURE_COL_ARR.map(name => col(name))
val logRegrDF = myDF.select(colNames:_*) // it is not the requirement


I am thinking for 2nd part code will be like this, but it is not working:

val logRegrDF = myDF.select(myDF("LEBEL_COLUMN").as("label"), colNames:_*)

Answer Source

If I understand your question, I hope this is what you are looking for

val allColumnsArr = "LEBEL_COLUMN" +: FEATURE_COL_ARR
result.select("LEBEL_COLUMN", allColumnsArr: _*)
  .withColumnRenamed("LEBEL_COLUMN", "label")

Hope this helps!

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download