nrksj nrksj - 2 years ago 97
Python Question

Ordering a nested dictionary by the frequency of the nested value

I have this

made from a csv which is massive.
For every item in
, I have broken it into it's
is always between 0-3 characters max length and
is variable.
I created an empty dictionary, D...(rest of code below):


for v in list:

id = v[0:3]
details = v[3:]

if id not in D:
D[id] = {}

if details not in D[id]:
D[id][details] = 0

D[id][details] += 1

aside: Can you help me understand what the two
statements are doing? Very new to python and programming.

Anyway, it produces something like this:

{'KEY1_1': {'key2_1' : value2_1, 'key2_2' : value2_2, 'key2_3' : value2_3},
'KEY1_2': {'key2_1' : value2_1, 'key2_2' : value2_2, 'key2_3' : value2_3},
and many more KEY1's with variable numbers of key2's

Each 'KEY1' is unique but each 'key2' isn't necessarily. The
are all different.

Ok so, right now I found a way to sort by the first KEY

for k, v in sorted(D.items()):
print k, ':', v

I have done enough research to know that dictionaries can't really be sorted but I don't care about sorting, I care about ordering or more specifically frequencies of occurrence. In my code
is the number of times its corresponding
occurs for that particular
. I am starting to think I should have used better variable names.

Question: How do I order the top-level/overall dictionary by the number in
which is in the nested dictionary? I want to do some statistics to those numbers like...

  1. How many times does the most frequent KEY1_x:key2_x pair show up?

  2. What are the 10, 20, 30 most frequent KEY1_x:key2_x pairs?

Can I only do that by each
or can I do it overall? Bonus: If I could order it that way for presentation/sharing that would be very helpful because it is such a large data set. So much thanks in advance and I hope I've made my question and intent clear.

Answer Source

You could use Counter to order the key pairs based on their frequency. It also provides an easy way to get x most frequent items:

from collections import Counter

d = {
    'KEY1': {
        'key2_1': 5,
        'key2_2': 1,
        'key2_3': 3
    'KEY2': {
        'key2_1': 2,
        'key2_2': 3,
        'key2_3': 4

c = Counter()
for k, v in d.iteritems():
    c.update({(k, k1): v1 for k1, v1 in v.iteritems()})

print c.most_common(3)


[(('KEY1', 'key2_1'), 5), (('KEY2', 'key2_3'), 4), (('KEY2', 'key2_2'), 3)]

If you only care about the most common key pairs and have no other reason to build nested dictionary you could just use the following code:

from collections import Counter

l = ['foobar', 'foofoo', 'foobar', 'barfoo']
D = Counter((v[:3], v[3:]) for v in l)
print D.most_common() # [(('foo', 'bar'), 2), (('foo', 'foo'), 1), (('bar', 'foo'), 1)]

The if statements you asked about are checking if the key exists in dict:

>>> d = {1: 'foo'}
>>> 1 in d
>>> 2 in d

So the following code will check if key with value of id exists in dict D and if it doesn't it will assign empty dict there.

if id not in D:
    D[id] = {}

The second if does exactly the same for nested dictionaries.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download