KingOfHypocrites KingOfHypocrites - 6 days ago 5
Scala Question

Skip/Take with Spark SQL

How would one go about implementing a skip/take query (typical server side grid paging) using Spark SQL. I have scoured the net and can only find very basic examples such as these here:
https://databricks-training.s3.amazonaws.com/data-exploration-using-spark-sql.html

I don't see any concept of ROW_NUMBER() or OFFSET/FETCH like with T-SQL. Does anyone know how to accomplish this?

Something like:

scala > csc.sql("select * from users skip 10 limit 10").collect()

Answer

Try something like this:

val rdd = csc.sql("select * from <keyspace>.<table>")
val rdd2 = rdd.view.zipWithIndex()
rdd2.filter(x => { x._2 > 5 && x._2 < 10;}).collect()
rdd2.filter(x => { x._2 > 9 && x._2 < 12;}).collect()