lorenzocastillo lorenzocastillo - 6 months ago 20
Python Question

How to Obtain k Most/Least Frequent in a list?

I have a list of items. My goal is to find the top k most frequent items without using collections.Counter.

Idea #1:
I walk through the list, put the item in a dictionary, along with a count. If I encounter the item again, I increment the count. Reverse keys. (item, count) -> (count, List of items), so now the counts point to items with such count. Walk through counts until I have found k items.

Idea #2, perhaps a bit faster:
I created a wrapper class "Count" that looks like this:

class Count:
def __init__(self, data):
self.data = data
self.count = 1

def inc(self):
self.count += 1
... __hash__, __eq__, etc...


So, I walk through the list, and put Count(item) in a dictionary. If I encounter the item again, I increment the count in the Count class. I will then use Quickselect on dictionary.keys() to get the top k items.

Is there a better/faster way?

Answer

You can use a defaultdict or just a dict to do the job comfortably. This essentially resembles your first laid out path.

from collections import defaultdict
import random

data = [random.randint(1,10) for _ in xrange(100)]

d = defaultdict(int)  # means default value is 0
# d = {}

for x in data:  # go through list
    d[x] += 1  # increment counts
    # d[x] = d.get(x, 0) + 1

# sort dict items by value (count) in descending order
sorted_items = sorted(d.items(), key=lambda i: -i[1])
# sorted_items = sorted(d.items(), key=lambda i: i[1], reverse=True)

# extract the keys
sorted_keys = [k for k, v in sorted_items]

# take n best
n = 8
n_most = sorted_keys[:n]

# [9, 5, 10, 3, 6, 2, 4, 1]

This sorts the dict's items (key-value pairs) by value (count) and slices of the first n keys.