AbreuFreire AbreuFreire - 4 months ago 18
JSON Question

Use Python and JSON to recursively get all keys associated with a value

Giving data which we organized in JSON format (see code example bellow) we need to get the path of keys and sub-keys associated with a given value. Taking the given example, when we receive an input "23314" we need to return a list with: Fanerozoico, Cenozoico, Quaternario, Pleistocenico, Superior.

Since data is in a JSON file, using python and json lib we have decoded it:

import json

def decode_crono(crono_file):
with open(crono_file) as json_file:
data = json.load(json_file)


But now we do not know how to treat it in a way to get what we need.
We can access keys like this:

k = data["Fanerozoico"]["Cenozoico"]["Quaternario "]["Pleistocenico "].keys()

or values like this:

v= data["Fanerozoico"]["Cenozoico"]["Quaternario "]["Pleistocenico "]["Superior"].values()

but this is still far from what we need.

{
"Fanerozoico": {
"id": "20000",
"Cenozoico": {
"id": "23000",
"Quaternario": {
"id": "23300",
"Pleistocenico": {
"id": "23310",
"Superior": {
"id": "23314"
},
"Medio": {
"id": "23313"
},
"Calabriano": {
"id": "23312"
},
"Gelasiano": {
"id": "23311"
}
}
}
}
}
}

Answer

It's a little hard to understand exactly what you are after here, but it seems like for some reason you have a bunch of nested json and you want to search it for an id and return a list that represents the path down the json nesting. If so, the quick and easy path is to recurse on the dictionary (that you got from json.load) and collect the keys as you go. When you find an 'id' key that matches the id you are searching for you are done. Here is some code that does that:

def all_keys(search_dict, key_id):
    def _all_keys(search_dict, key_id, keys=None):
        if not keys:
            keys = []
        for i in search_dict:
            if search_dict[i] == key_id:
                return keys + [i]
            if isinstance(search_dict[i], dict):
                potential_keys = _all_keys(search_dict[i], key_id, keys + [i])
                if 'id' in potential_keys:
                    keys = potential_keys
                    break
        return keys
    return _all_keys(search_dict, key_id)[:-1]

The reason for the nested function is to strip off the 'id' key that would otherwise be on the end of the list.

This is really just to give you an idea of what a solution might look like. Beware the python recursion limit!