horcle_buzz horcle_buzz - 3 months ago 11
Python Question

How to best extract sub dictionaries by value in this object?

I am dealing with a database in which someone created a PHP ArrayObject that had virtually no checks in place before being created.

I am able to extract this as a dictionary of dictionaries using the
python phpserialize library's unserialize module, so that it looks like this:

{0: '{"0": {"0": "use_of_color", "1": "3"}, "1": {"0": "plant_variety", "1": "2"}, "2": {"0": "design", "1": "2"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "3"}}',
1: '{"0": {"0": "use_of_color", "1": "3"}, "1": {"0": "plant_variety", "1": "2"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "4"}, "4": {"0": "environmental_stewardship", "1": "2"}}',
2: '{"0": {"0": "color", "1": 3}, "1": {"0": "plant_variety", "1": 3}, "2": {"0": "design", "1": 4}, "3": {"0": "maintenance", "1": 4}, "4": {"0": "environmental_stewardship", "1": 4}}',
3: '{"0": {"0": "location", "1": "4"}, "1": {"0": "sizing", "1": "4"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "3"}, "4": {"0": "visual_appeal", "1": "4"}}',
4: '{"0": {"0": "visual_impact", "1": "3"}, "1": {"0": "plant_variety_and_health", "1": "4"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": 0}, "4": {"0": "environmental_stewardship", "1": "2"}}',
5: '{"0": {"0": "location", "1": "3"}, "1": {"0": "sizing", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "3"}, "4": {"0": "visual_appeal", "1": "3"}}',

...

56: '{"0": {"0": "visual_impact", "1": "2"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "1"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "1"}}',
57: '{"0": {"0": "color", "1": 3}, "1": {"0": "plant_variety", "1": 2}, "2": {"0": "design", "1": 1}, "3": {"0": "maintenance", "1": 2}, "4": {"0": "environmental_stewardship", "1": 2}}',
58: '{"0": {"0": "visual_impact", "1": "4"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "4"}, "3": {"0": "maintenance", "1": "3"}, "4": {"0": "environmental_stewardship", "1": "4"}}',
59: '{"0": {"0": "visual_impact", "1": "3"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "3"}, "4": {"0": "environmental_stewardship", "1": "4"}}',
60: '{"0": {"0": "use_of_color", "1": "3"}, "1": {"0": "plant_variety", "1": "4"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "3"}}',
61: '{"0": {"0": "use_of_color", "1": "3"}, "1": {"0": "plant_variety", "1": "4"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "3"}}',
62: '{"0": {"0": "visual_impact", "1": "2"}, "1": {"0": "plant_variety_and_health", "1": "2"}, "2": {"0": "design", "1": "2"}, "3": {"0": "maintenance", "1": "1"}, "4": {"0": "environmental_stewardship", "1": "1"}}',
63: '{"0": {"0": "visual_impact", "1": "4"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "4"}, "4": {"0": "environmental_stewardship", "1": "4"}}',
64: '{"0": {"0": "visual_impact", "1": "4"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "4"}, "4": {"0": "environmental_stewardship", "1": "4"}}',
65: '{"0": {"0": "visual_impact", "1": "3"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "3"}}'}


The problem is that I need a way to extract the sub dictionaries that have a the same values (e.g., all those with "visual_impact" or "color", etc.). However, since these sub dictionaries are not paired with the same key throughout the object, this seems not possible.

I am thinking that maybe reassigning the key names to align with the values would be doable.

So, for example

dict = {"0": {"0": "color", "1": 3}, "1": {"0": "plant_variety", "1": 3}, "2": {"0": "design", "1": 4}, "3": {"0": "maintenance", "1": 4}, "4": {"0": "environmental_stewardship", "1": 4}}


Would instead become

dict = {"0": {"0": "color", "1": 3}, "4": {"0": "plant_variety", "1": 3}, "1": {"0": "design", "1": 4}, "3": {"0": "maintenance", "1": 4}, "2": {"0": "environmental_stewardship", "1": 4}}


Thus, for
dict["0"]
I want to always have "color" in the sub dictionary/value,
dict["1"]
would always have "design", etc. So, for my example dict above,
dict["0"]
would give
{"0": "color", "1": 3}
,
dict["1"]
would give
{"0": "design", "1": 4}
, etc.

Thus, I am trying to reassign the keys based on what is in the value/sub dictionary. Key "0" always has "color" in the sub dictionary/value, key "1" always has "design", etc. for the whole dictionary of dictionaries listed above.

I found this change-the-name-of-a-key-in-dictionary, but this object is confusing in terms of how to do this, since this is dependent on the value/sub dictionary's content.

I know that I have to deal with making sure that values, such as, 'use_of_color' is changed to 'color', etc. are uniformly named before doing this, but that should not be a problem. I just need a way to ensure that I am always extracting the sub dictionary with the value of 'color' by the same key, and the only way I can see of doing this is by reassigning the keys.

If there is a better way to deal with this, I am open to suggestions.

Answer

I'm assuming you want a grouping of the sub-dictionaries by the value of their key '0', i.e. by 'location', 'environmental_stewardship', etc. But actually, you don't have subdictionaries at all, you have strings that are dictionary literals. If your dictionary were named horrible_mess, you could use this quick hack:

>>> from ast import literal_eval
>>> still_messy  = {k:literal_eval(v) for k,v in horrible_mess.items()}

Then, it's probably easiest to simply do the following:

>>> from collections import defaultdict
>>> grouped = defaultdict([])
>>> for sub in still_messy.values():
...     for d in sub.values():
...         grouped[d['0']].append(d)
... 
>>> grouped['visual_appeal']
[{'1': '4', '0': 'visual_appeal'}, {'1': '3', '0': 'visual_appeal'}]
>>> grouped['environmental_stewardship']
[{'1': '3', '0': 'environmental_stewardship'}, {'1': '2', '0': 'environmental_stewardship'}, {'1': 4, '0': 'environmental_stewardship'}, {'1': '2', '0': 'environmental_stewardship'}, {'1': '3', '0': 'environmental_stewardship'}, {'1': '4', '0': 'environmental_stewardship'}, {'1': '1', '0': 'environmental_stewardship'}, {'1': 2, '0': 'environmental_stewardship'}, {'1': '4', '0': 'environmental_stewardship'}, {'1': '4', '0': 'environmental_stewardship'}, {'1': '3', '0': 'environmental_stewardship'}, {'1': '3', '0': 'environmental_stewardship'}, {'1': '1', '0': 'environmental_stewardship'}, {'1': '4', '0': 'environmental_stewardship'}]
>>>