Christian Yonathan S. Christian Yonathan S. - 1 year ago 51
Scala Question

How to count Duplicate values in scala?

I would like ask, how can I count duplicate values?

which format : USER, ITEM, EVENT

I want to count, how many times every item is shown.

Here are some examples:

US50137,IT1548,7), (US42215,IT6298,7), (US98606,IT5305,7), (US34696,IT5914,7), (US74972,IT2796,7), (US1729,IT7696,7), (US76310,IT9790,7), (US49102,IT6487,7), (US25430,IT7901,7), (US50600,IT4156,7), (US65972,IT9830,7), (US50879,IT1902,7), (US36024,IT6484,7), (US46284,IT3281,7), (US55565,IT5303,7), (US18932,IT2025,7), (US39467,IT8677,7), (US12477,IT9678,7), (US94819,IT8427,7), (US19956,IT1402,7), (US41507,IT3624,7), (US845,IT4823,7), (US18860,IT7860,7), (US68784,IT4759,7), (US79752,IT421,7), (US18563,IT5329,7), (US79628,IT2351,7), (US83729,IT6082,7), (US61097,IT9643,7), (US69368,IT3162,7), (US59566,IT814,7), (US9726,IT7519,7), (US1157,IT5908,7), (US1176,IT3981,7), (US79409,IT8578,7), (US11786,IT5147,7), (US88604,IT8501,7), (US6857,IT2333,7), (US82349,IT6143,7), (US27666,IT9085,7), (US90508,IT352,7), (US48578,IT4503,7), (US14526,IT9551,7), (US29031,IT1992,7), (US57012,IT4353,7), (US97235,IT77,7), (US88666,IT2715,7), (US31035,IT7865,7), (US45054,IT6664,7), (US92069,IT9951,7), (US27175,IT913,7), (US60402,IT8480,7), (US28426,IT9309,7), (US23641,IT4518,7), (US10889,IT7348,7), (US16950,IT6087,7), (US68766,IT683,7), (US87726,IT7594,7), (US63638,IT8101,7), (US78079,IT4344,7), (US47257,IT3315,7), (US3915,IT8971,7), (US59440,IT3441,7), (US64466,IT3980,7), (US79624,IT3502,7), (US29356,IT6778,7)

From this link :

Scala how can I count the number of occurrences in a list

My code :

baris => (
baris(2) match {
case "read" => 10
case "play" => 6
case "share" => 7
val MM = RATING_SPLITER.groupBy(kk => kk._2).map(x1 => (x1._2))

and then, the output below :


Any idea, why the output looks like that? and is it my code correct to count duplicate values?

Answer Source

You should map the values resulting from the groupBy to their size - groupBy creates key-value pairs where the value is the collection of all items with same key, you're only interested in the size of that collection:

// sample data:
val RATING_SPLITER = List(("A", "b", 4), ("A", "b", 5), ("A", "c", 6), ("A", "e", 7))

val result: Map[String,Int] = RATING_SPLITER.groupBy(_._2).mapValues(_.size)
// prints:
// (e,1)
// (b,2)
// (c,1)