covariance covariance - 1 month ago 14
Python Question

Python: Collections.Counter vs defaultdict(int)

Suppose I have some data that looks like the following.

Lucy = 1
Bob = 5
Jim = 40
Susan = 6
Lucy = 2
Bob = 30
Harold = 6


I want to combine 1) remove duplicate keys, and 2) add the values for these duplicate keys. That means I'd get the key/values:

Lucy = 3
Bob = 35
Jim = 40
Susan = 6
Harold = 6


Would it be better to use (from collections) a counter or a default dict for this?

Answer

Both Counter and defaultdict(int) can work fine here, but there are few differences between them:

  • Counter supports most of the operations you can do on a multiset. So, if you want to use those operation then go for Counter.

  • Counter won't add new keys to the dict when you query for missing keys. So, if your queries include keys that may not be present in the dict then better use Counter.

Example:

>>> c = Counter()
>>> d = defaultdict(int)
>>> c[0], d[1]
(0, 0)
>>> c
Counter()
>>> d
defaultdict(<type 'int'>, {1: 0})

Example:

  • Counter also has a method called most_common that allows you to sort items by their count. To get the same thing in defaultdict you'll have to use sorted.

Example:

>>> c = Counter('aaaaaaaaabbbbbbbcc')
>>> c.most_common()
[('a', 9), ('b', 7), ('c', 2)]
>>> c.most_common(2)          #return 2 most common items and their counts
[('a', 9), ('b', 7)]
  • Counter also allows you to create a list of elements from the Counter object.

Example:

>>> c = Counter({'a':5, 'b':3})
>>> list(c.elements())
['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b']

So, depending on what you want to do with the resulting dict you can choose between Counter and defaultdict(int).