Amber Amber - 26 days ago 12
Scala Question

Sort a Spark data frame/ Hive result set

I'm trying to retrieve the list of columns from a Hive table and store the result in a spark dataframe.

var my_column_list = hiveContext.sql(s""" SHOW COLUMNS IN $my_hive_table""")


But I'm unable to alphabetically sort the dataframe or even the result of the show columns query. I tried using sort and orderBy().

How could I sort the result alphabetically?

Update: Added a sample of my code

import org.apache.spark.{ SparkConf, SparkContext }
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.hive.HiveContext

val hiveContext = new HiveContext(sc)
hiveContext.sql("USE my_test_db")

var lv_column_list = hiveContext.sql(s""" SHOW COLUMNS IN MYTABLE""")
//WARN LazyStruct: Extra bytes detected at the end of the row! Ignoring similar problems

lv_column_list.show //Works fine
lv_column_list.orderBy("result").show //Error arises

Answer

Instead of 'SHOW COLUMNS', I used 'DESC' and retrieved the column list with "col_name".

var lv_column_list = hiveContext.sql(s""" DESC MYTABLE""")
lv_column_list.select("col_name").orderBy("col_name")