Raul H Raul H - 1 year ago 72
Scala Question

Take first n records from dataframe grouped by unique id

I have my Dataset like this

enter image description here

As you see is ordered by rating and userId I need to get a new Dataframe with only the top 2 results of each group by unique user_id I've tried to


I tried to use rank function but it seems not to work,I tried to filter the dataframe but no result how could I accomplish this?

Answer Source


import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions.row_number

val window = Window.partitionBy("userId").orderBy($"rating".desc)

dataframe.withColumn("r", row_number.over(window)).where($"r" <= n)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download