shakedzy shakedzy - 3 months ago 10
Scala Question

Using Scala lists in Spark SQL queries

I'm trying to run the following query:

val IgnoreList = List(""," ","0","-","{}","()","[]","null","Null","NULL","false","False","FALSE","NA","na","Na","n/a","N/a","N/A","nil","Nil","NIL")
val df = sqlContext.sql(s"select userName from names where userName not in $IgnoreList")


But this won't work. I also tried:

val IgnoreList = List(""," ","0","-","{}","()","[]","null","Null","NULL","false","False","FALSE","NA","na","Na","n/a","N/a","N/A","nil","Nil","NIL")
sqlContext.udf.register("SqlList",(s: List[String]) => "('" + s.mkString("','") + "')")
val df = sqlContext.sql(s"select userName from names where userName not in SqlList($IgnoreList)")


But that won't work either. Any suggestions?

Answer

Your first attempt fails because it calls List's default toString, which doesn't return the SQL-valid syntax that you need. Your second attempt fails because there's no sense in using a UDF to build the SQL string - UDFs are to be applied to records (or columns), not to create string queries.

You need the formatting done in the second, combined with the plain String interpolation done in the first:

val IgnoreList = List(""," ","0","-","{}","()","[]","null","Null","NULL","false","False","FALSE","NA","na","Na","n/a","N/a","N/A","nil","Nil","NIL")
val condition = "('" + IgnoreList.mkString("','") + "')"
val df = sqlContext.sql(s"select userName from names where userName not in $condition")

BTW, might be clearer to format the list this way:

IgnoreList.map(s => s"'$s'").mkString(",")
Comments