Shankar Shankar - 17 days ago 6
Scala Question

How to get the total number of records till the current hour from a lookup table

I have a DataFrame with the lookup table data, for each and every hour there will a entry in this table. How do i calculate the total number of records till the current hour?

For example my DF data

+----+-----+
|hour|count|
+----+-----+
|0.00| 10|
|1.00| 5|
|2.00| 10|
|3.00| 15|
|4.00| 10|
|5.00| 10|
+----+-----+


If i pass "4.00" as input, it should return the total count till 4 hour.

Expected output is:

Total count
50


Sample code i tried:

val df = Seq(("0.00", "10"),
("1.00", "15")).toDF("hour", "reccount")
df.show
df.printSchema

df.registerTempTable("erv")
//sqlContext.sql("select hour,reccount from erv").show
sqlContext.sql("select sum(reccount) over(partition by hour) as running_total from erv").show


But i am getting the below error.


Exception in thread "main" java.lang.RuntimeException: [1.26] failure:
``union'' expected but `(' found

select sum(reccount) over(partition by hour) as running_total from erv


I also tried the Window functions like below, but Its expecting HiveContext needs to be created, when i try to create HiveContext locally its not creating HiveContext.

window function code:

val wSpec = Window.partitionBy("hour").orderBy("hour").rowsBetween(Long.MinValue, 0)
df.withColumn("cumSum", sum(df("reccount")).over(wSpec)).show()

Answer

Not sure why you'd want to use Window Functions if you can simply filter to get the right hours and agg:

val upTo = 4.0
val result = input.filter($"hour" <= upTo).agg(sum($"count") as "Total Count")

result.show()
// +-----------+
// |Total Count|
// +-----------+
// |         50|
// +-----------+