sachin27 sachin27 - 16 days ago 4
Python Question

creating dynamic nested dictionary

I have in file abc.txt

abc/pqr/lmn/xyz:pass
abc/pqr/lmn/bcd:pass


i need to parse these statements and output should be in nested dictionary as below

{'abc':{'pqr':{'lmn':{'xyz':{'pass':1},{'bcd':{'pass':1}}}}}}


where 1 is pass count.
i can able to do as much as this

import re
d={}
p=re.compile('[a-zA-z]+')
for line in open('abc.txt'):
for key in p.findall(line):
d['key']={}


i am new to the python so can anyone help me through this

Answer

Here's an updated version of my answer in which leaves of the tree data-structure are now different from those in rest of it. Instead of the tree being strictly a dict-of-nested-dicts, the "leaves" on each branch are now instances of a different subclass of dict named collections.Counterwhich are useful for counting the number of times each of their keys occur. I did this because of your response to my question about what should happen if the last part of each line was something other than ":pass" (which was "we have to put new count for that key").

Nested dictionaries are often called Tree data-structures and can be defined recursively — the root is a dictionary as are the branches. The following uses a dict subclass instead of a plain dict because it makes constructing them easier since you don't need to special case the creation of the first branch of next level down (except I still do when adding the "leaves" because they are a different subclass, collections.Counter).

from collections import Counter
import re

# (optional) trick that redefines Counter subclass to print like a regular dict
class Counter(Counter):
    def __repr__(self):
        return dict(self).__repr__()

# borrowed from answer @ http://stackoverflow.com/a/19829714/355230
class Tree(dict):
    def __missing__(self, key):
        value = self[key] = type(self)()
        return value

# some functions based on answer @ http://stackoverflow.com/a/14692747/355230
def nested_dict_get(nested_dict, keys):
    return reduce(lambda d, k: d[k], keys, nested_dict)

def nested_dict_set(nested_dict, keys, value):
    nested_dict_get(nested_dict, keys[:-1])[keys[-1]] = value

def nested_dict_update_count(nested_dict, keys):
    if nested_dict_get(nested_dict, keys[:-1]):  # update existing Counter
        nested_dict_get(nested_dict, keys[:-1]).update([keys[-1]])
    else:                                        # create a new  Counter
        nested_dict_set(nested_dict, keys[:-1], Counter([keys[-1]]))

d = Tree()
pat = re.compile(r'[a-zA-z]+')
with open('abc.txt') as file:
    for line in file:
        nested_dict_update_count(d, [w for w in pat.findall(line.rstrip())])

print d  # prints like a regular dict

To test the leaf-counting capabilities of the revised code, I used the following test file which includes the same line twice, once ending again with :pass and another ending in :fail.

Expanded abc.txt test file:

abc/pqr/lmn/xyz:pass
abc/pqr/lmn/bcd:pass
abc/pqr/lmn/xyz:fail
abc/pqr/lmn/xyz:pass

Output:

{'abc': {'pqr': {'lmn': {'bcd': {'pass': 1}, 'xyz': {'fail': 1, 'pass': 2}}}}}

Please let me know if this is a correct interpretation of your comment about counting the last word on each line.

Comments