bobdobalina bobdobalina - 6 months ago 42
Python Question

Convert a nested dict to a dict that only include list items that has a unique set of keys

How do I convert the following nested dict to a dict that only include list items that has a unique set of keys (regardless of the values)?

I don't know how many levels of nesting there are, and I don't care which list item that is returned, as long as it has an unique set of keys (for the list that the item is part of)

(I am trying to generate an example file from a very long YAML file for documentation purposes)

input = {
"mylist": [
{
"key1": "1333",
"key2": [
{
"key2a":134,
"key2b":1373
},
{
"key2a":124,
"key2b":136
}
]
},{
"key1": "875",
"key2": [
{
"key2a":999,
"key2b":6567
},
{
"key2a":8765,
"key2b":875
}
]
},{
"key1": "6754",
"key3": 3232
},{
"key1": "34545",
"key3": 34554
}
]
}


needed output:

{
"mylist": [
{
"key1": "1333",
"key2": [
{ "key2a":134,
"key2b":1373
}
]
},{
"key1": "6754",
"key3": 3232
}
]
}


I made this (verbose) code that solves it by fetching and storing all keys it finds in list item objects, but I am sure this can be done in a shorter way?

input = collections.OrderedDict(input)
def get_keys(obj,keys=[]):
if isinstance(obj, (dict,collections.OrderedDict)):
for k, v in obj.items():
if not isinstance(v, (dict,collections.OrderedDict)):
keys.append(k)
get_keys(v,keys)
elif isinstance(obj, list):
for elem in obj:
if not isinstance(elem, (dict,collections.OrderedDict,list)):
keys.append(elem)
get_keys(elem,keys)
return keys

def traverse(obj, callback=None):
if isinstance(obj, (dict,collections.OrderedDict)):
value = {k: traverse(v, callback)
for k, v in obj.items()}
elif isinstance(obj, list):
value = [traverse(elem, callback)
for elem in obj]
else:
value = obj
if callback is None:
return value
else:
return callback(value)

def traverse_modify(obj):
def yaml_shortener(obj):
duplicates = []
if isinstance(obj,list) and len(obj)>1:
return_list = []
for i,elem in enumerate(obj):
if not any(Counter(get_keys(elem,keys=[])) == Counter(item) for item in duplicates):
return_list.append(elem)
duplicates.append(get_keys(elem,keys=[]))
return return_list
else:
return obj
return traverse(obj, callback=yaml_shortener)

def shorten_yaml(obj):
return traverse_modify(obj)

print json.dumps(shorten_yaml(input),indent=3)

Answer

First of all I assume that the value associated to a dictionary key is either a list of dictionaries or a scalar value. This value is processed by function convert_value(), which:

  • calls function process_dict_list() to extract the unique set of keys in case the value is a list of dictionaries;
  • leaves the value unchanged if it is a scalar.

The magic happens inside function convert_dict_list(), which:

  • collects all the keys (and values) of each dictionary of the passed list of dictionaries;
  • sorts them and creates a temporary dictionary mapping each tuple of keys into the corresponding list of converted values; such a dictionary contains - of course - unique tuple of keys, exactly as required;
  • finally the tuples of keys and list of values are converted back into expected dictionaries.

Function convert() is a simple interface function accepting a dictionary with the input data: it simply converts the dictionary into a list of dictionaries (with a single item) and calls the previous function to process it.

Below is the full code:

def convert(a_dict):
    return convert_dict_list([a_dict])[0]

def convert_dict_list(dict_list):
    sorted_items = (zip(*sorted(d.iteritems())) for d in dict_list)
    tmp_dict = dict((tuple(keys), map(convert_value, values))
        for (keys, values) in sorted_items)
    return map(dict, [zip(k, v) for (k, v) in tmp_dict.iteritems()])

def convert_value(val):
    return convert_dict_list(val) if isinstance(val, list) else val

And here is the output generated by using the example input:

>>> print convert(input)
{'mylist': [{'key3': 34554, 'key1': '34545'}, {'key2': [{'key2b': 875, 'key2a': 8765}], 'key1': '875'}]} 
Comments