Is there a way to keep the duplicates in a collected set in Hive, or simulate the sort of aggregate collection that Hive provides using some other method? I want to aggregate all of the items in a column that have the same key into an array, with duplicates.
hash_id | num_of_cats
hash_agg | cats_aggregate
Try to use COLLECT_LIST(col) after Hive 0.13.0
SELECT hash_id, COLLECT_LIST(num_of_cats) AS aggr_set FROM tablename WHERE blablabla GROUP BY hash_id ;