Christian Yonathan S. Christian Yonathan S. - 2 months ago 6
Scala Question

How to count Duplicate values in scala?

I would like ask, how can I count duplicate values?

which format : USER, ITEM, EVENT

I want to count, how many times every item is shown.

Here are some examples:

US50137,IT1548,7), (US42215,IT6298,7), (US98606,IT5305,7), (US34696,IT5914,7), (US74972,IT2796,7), (US1729,IT7696,7), (US76310,IT9790,7), (US49102,IT6487,7), (US25430,IT7901,7), (US50600,IT4156,7), (US65972,IT9830,7), (US50879,IT1902,7), (US36024,IT6484,7), (US46284,IT3281,7), (US55565,IT5303,7), (US18932,IT2025,7), (US39467,IT8677,7), (US12477,IT9678,7), (US94819,IT8427,7), (US19956,IT1402,7), (US41507,IT3624,7), (US845,IT4823,7), (US18860,IT7860,7), (US68784,IT4759,7), (US79752,IT421,7), (US18563,IT5329,7), (US79628,IT2351,7), (US83729,IT6082,7), (US61097,IT9643,7), (US69368,IT3162,7), (US59566,IT814,7), (US9726,IT7519,7), (US1157,IT5908,7), (US1176,IT3981,7), (US79409,IT8578,7), (US11786,IT5147,7), (US88604,IT8501,7), (US6857,IT2333,7), (US82349,IT6143,7), (US27666,IT9085,7), (US90508,IT352,7), (US48578,IT4503,7), (US14526,IT9551,7), (US29031,IT1992,7), (US57012,IT4353,7), (US97235,IT77,7), (US88666,IT2715,7), (US31035,IT7865,7), (US45054,IT6664,7), (US92069,IT9951,7), (US27175,IT913,7), (US60402,IT8480,7), (US28426,IT9309,7), (US23641,IT4518,7), (US10889,IT7348,7), (US16950,IT6087,7), (US68766,IT683,7), (US87726,IT7594,7), (US63638,IT8101,7), (US78079,IT4344,7), (US47257,IT3315,7), (US3915,IT8971,7), (US59440,IT3441,7), (US64466,IT3980,7), (US79624,IT3502,7), (US29356,IT6778,7)

From this link :

Scala how can I count the number of occurrences in a list

My code :

baris => (
baris(2) match {
case "read" => 10
case "play" => 6
case "share" => 7
val MM = RATING_SPLITER.groupBy(kk => kk._2).map(x1 => (x1._2))

and then, the output below :


Any idea, why the output looks like that? and is it my code correct to count duplicate values?


You should map the values resulting from the groupBy to their size - groupBy creates key-value pairs where the value is the collection of all items with same key, you're only interested in the size of that collection:

// sample data:
val RATING_SPLITER = List(("A", "b", 4), ("A", "b", 5), ("A", "c", 6), ("A", "e", 7))

val result: Map[String,Int] = RATING_SPLITER.groupBy(_._2).mapValues(_.size)
// prints:
// (e,1)
// (b,2)
// (c,1)