Amrith Krishna Amrith Krishna - 1 year ago 44
Python Question

From a defaultdict, find the pair of keys with highest values

Given a

with both the keys as strings and their value is float, then how do I find the k keys based on descending order of the value from teh entire defaultdict.

I can write 2 loops and trivially store a list of 100 pairs, and replace the list with a new entry if something with more than exisitng value is found in n^2 times. But is there a pythonic way of doing the same.

sample file

defaultdict(<type 'dict'>, {u'just': {u'don': 24.163775416342308, u'like': 28.68171888897304, u'make': 21.69210433035232},'like':{'just':28.68171888897304,'don':12.34, 'mike':27.675}}

desired output (assuming I need only top 3 valued entries from the entire collection)

just,like, 28.68171
like,mike, 27.675
just,don, 24.16377


Whenever you're extracting largest and smallests values, the heapq.nlargest and heapq.nsmallest functions are your new best friend:

>>> from heapq import nlargest
>>> from operator import itemgetter
>>> from pprint import pprint

>>> d = defaultdict(dict, 
          {'just': {'don': 24.163775416342308,
                    'like': 28.68171888897304,
                    'make': 21.69210433035232},
           'like': {'don': 12.34,
                    'just': 28.68171888897304,
                    'mike': 27.675}})

>>> flattened = ((outerkey, innerkey, value) for outerkey, innerdict in d.items() 
                 for innerkey, value in innerdict.items())
>>> result = nlargest(3, flattened, key=itemgetter(2))

>>> pprint(result)
[('just', 'like', 28.68171888897304),
 ('like', 'just', 28.68171888897304),
 ('like', 'mike', 27.675)]

In Python 2, it would be more space efficient to use iteritems() instead of items().