sai kumar sai kumar - 11 months ago 172
Scala Question

How to aggregate data in Spark using Scala?

I have a data set

test1.txt
. It contain data like below

2::1::3
1::1::2
1::2::2
2::1::5
2::1::4
3::1::2
3::1::1
3::2::2


I have created data-frame using the below code.

case class Test(userId: Int, movieId: Int, rating: Float)
def pRating(str: String): Rating = {
val fields = str.split("::")
assert(fields.size == 3)
Test(fields(0).toInt, fields(1).toInt, fields(2).toFloat)
}

val ratings = spark.read.textFile("C:/Users/test/Desktop/test1.txt").map(pRating).toDF()
2,1,3
1,1,2
1,2,2
2,1,5
2,1,4
3,1,2
3,1,1
3,2,2


But I want to print output like below I.e. removing duplicate combinations and instead of
field(2) value sum of values1,1, 2.0
.

1,1,2.0
1,2,2.0
2,1,12.0
3,1,3.0
3,2,2.0


Please help me on this, how can achieve this.

Answer Source
ratings.groupBy("userId","movieId").sum(rating) 
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download