Christian Yonathan S. Christian Yonathan S. - 3 months ago 9
Scala Question

How to count Duplicate values in scala?

I would like ask, how can I count duplicate values?

which format : USER, ITEM, EVENT

I want to count, how many times every item is shown.

Here are some examples:


US50137,IT1548,7), (US42215,IT6298,7), (US98606,IT5305,7), (US34696,IT5914,7), (US74972,IT2796,7), (US1729,IT7696,7), (US76310,IT9790,7), (US49102,IT6487,7), (US25430,IT7901,7), (US50600,IT4156,7), (US65972,IT9830,7), (US50879,IT1902,7), (US36024,IT6484,7), (US46284,IT3281,7), (US55565,IT5303,7), (US18932,IT2025,7), (US39467,IT8677,7), (US12477,IT9678,7), (US94819,IT8427,7), (US19956,IT1402,7), (US41507,IT3624,7), (US845,IT4823,7), (US18860,IT7860,7), (US68784,IT4759,7), (US79752,IT421,7), (US18563,IT5329,7), (US79628,IT2351,7), (US83729,IT6082,7), (US61097,IT9643,7), (US69368,IT3162,7), (US59566,IT814,7), (US9726,IT7519,7), (US1157,IT5908,7), (US1176,IT3981,7), (US79409,IT8578,7), (US11786,IT5147,7), (US88604,IT8501,7), (US6857,IT2333,7), (US82349,IT6143,7), (US27666,IT9085,7), (US90508,IT352,7), (US48578,IT4503,7), (US14526,IT9551,7), (US29031,IT1992,7), (US57012,IT4353,7), (US97235,IT77,7), (US88666,IT2715,7), (US31035,IT7865,7), (US45054,IT6664,7), (US92069,IT9951,7), (US27175,IT913,7), (US60402,IT8480,7), (US28426,IT9309,7), (US23641,IT4518,7), (US10889,IT7348,7), (US16950,IT6087,7), (US68766,IT683,7), (US87726,IT7594,7), (US63638,IT8101,7), (US78079,IT4344,7), (US47257,IT3315,7), (US3915,IT8971,7), (US59440,IT3441,7), (US64466,IT3980,7), (US79624,IT3502,7), (US29356,IT6778,7)





From this link :

Scala how can I count the number of occurrences in a list


My code :


val RATING_SPLITER = N1.map(
{
baris => (
baris(0),
baris(1),
baris(2) match {
case "read" => 10
case "play" => 6
case "share" => 7
}
)
}
).take(1000)
val MM = RATING_SPLITER.groupBy(kk => kk._2).map(x1 => (x1._2))
MM.foreach(println)


and then, the output below :


[Lscala.Tuple3;@fd53053
[Lscala.Tuple3;@4527f70a
[Lscala.Tuple3;@707b1a44
[Lscala.Tuple3;@7132a9dc
[Lscala.Tuple3;@57435801
[Lscala.Tuple3;@2da66a44
[Lscala.Tuple3;@527fc8e
[Lscala.Tuple3;@61bfc9bf
[Lscala.Tuple3;@2c7106d9
[Lscala.Tuple3;@329bad59





Any idea, why the output looks like that? and is it my code correct to count duplicate values?


Answer

You should map the values resulting from the groupBy to their size - groupBy creates key-value pairs where the value is the collection of all items with same key, you're only interested in the size of that collection:

// sample data:
val RATING_SPLITER = List(("A", "b", 4), ("A", "b", 5), ("A", "c", 6), ("A", "e", 7))

val result: Map[String,Int] = RATING_SPLITER.groupBy(_._2).mapValues(_.size)
result.foreach(println)
// prints:
// (e,1)
// (b,2)
// (c,1)