Amrith Krishna Amrith Krishna - 1 year ago 51
Python Question

From a defaultdict, find the pair of keys with highest values

Given a

with both the keys as strings and their value is float, then how do I find the k keys based on descending order of the value from teh entire defaultdict.

I can write 2 loops and trivially store a list of 100 pairs, and replace the list with a new entry if something with more than exisitng value is found in n^2 times. But is there a pythonic way of doing the same.

sample file

defaultdict(<type 'dict'>, {u'just': {u'don': 24.163775416342308, u'like': 28.68171888897304, u'make': 21.69210433035232},'like':{'just':28.68171888897304,'don':12.34, 'mike':27.675}}

desired output (assuming I need only top 3 valued entries from the entire collection)

just,like, 28.68171
like,mike, 27.675
just,don, 24.16377

Answer Source

Whenever you're extracting largest and smallests values, the heapq.nlargest and heapq.nsmallest functions are your new best friend:

>>> from heapq import nlargest
>>> from operator import itemgetter
>>> from pprint import pprint

>>> d = defaultdict(dict, 
          {'just': {'don': 24.163775416342308,
                    'like': 28.68171888897304,
                    'make': 21.69210433035232},
           'like': {'don': 12.34,
                    'just': 28.68171888897304,
                    'mike': 27.675}})

>>> flattened = ((outerkey, innerkey, value) for outerkey, innerdict in d.items() 
                 for innerkey, value in innerdict.items())
>>> result = nlargest(3, flattened, key=itemgetter(2))

>>> pprint(result)
[('just', 'like', 28.68171888897304),
 ('like', 'just', 28.68171888897304),
 ('like', 'mike', 27.675)]

In Python 2, it would be more space efficient to use iteritems() instead of items().