Pankaj Singh Pankaj Singh - 2 months ago 7
Python Question

Distinct rows from a list of dictionaries based on values

I have this sample input below sampleInputDbData

def sampleInputDbData( self ):
return \
[
{'FundCode': 300, 'FundName': 'First Fund', 'ProdStartDate': dt(2016,7,3,4,5,6), 'ProdEndDate': dt(2016,8,3,4,5,6), 'FundFee': 100},
{'FundCode': 300, 'FundName': 'First Fund', 'ProdStartDate': dt(2016,8,3,4,5,6), 'ProdEndDate': dt(2016,8,3,6,5,6), 'FundFee': 101 },
{'FundCode': 300, 'FundName': 'First Fund', 'ProdStartDate': dt(2016,8,3,6,5,6), 'ProdEndDate': dt(2016,8,15,6,5,6), 'FundFee': 102 },
{'FundCode': 301, 'FundName': 'Second Fund', 'ProdStartDate': dt(2016,7,3,4,5,6), 'ProdEndDate': dt(2016,8,3,6,5,6), 'FundFee': 110},
{'FundCode': 301, 'FundName': 'Second Fund', 'ProdStartDate': dt(2016,8,3,6,5,6), 'ProdEndDate': dt(2016,8,15,6,5,6), 'FundFee': 111},
{'FundCode': 302, 'FundName': 'Third Fund', 'ProdStartDate': dt(2016,8,3,6,5,6), 'ProdEndDate': dt(2016,8,15,6,5,6), 'FundFee': 120},
]


What I want is this sampleOutputDbData as output

def sampleOutputDbData( self ):
return \
[
{'FundCode': 300, 'FundName': 'First Fund', 'ProdStartDate': dt(2016,8,3,6,5,6), 'ProdEndDate': dt(2016,8,15,6,5,6), 'FundFee': 102 },
{'FundCode': 301, 'FundName': 'Second Fund', 'ProdStartDate': dt(2016,8,3,6,5,6), 'ProdEndDate': dt(2016,8,15,6,5,6), 'FundFee': 111},
{'FundCode': 302, 'FundName': 'Third Fund', 'ProdStartDate': dt(2016,8,3,6,5,6), 'ProdEndDate': dt(2016,8,15,6,5,6), 'FundFee': 120},
]


The decision factor is basically: Get all unique FundCode based on max value of the key ProdEndDate. dt is type datetime

Answer

This works:

from collections import defaultdict
from operator import itemgetter

code_dict = defaultdict(list)
for d in sampleInputDbData:
    code_dict[d["FundCode"]].append(d)

output_data = [max(d, key=itemgetter("ProdEndDate")) for d in code_dict.values()]

I first create a default dict for temporary sorting by FundCode. Each key will contain a list with all dicts with the same FundCode. Then, I'm taking the last ProdEndDate from each list.