kit kit - 4 months ago 23
Python Question

Check key, value of nested dictionary in python?

I'm generating a nested dictionary in my program. After generating, I want to iterate through that dictionary, and check for the dictionary key and value.

Program-Code

This is the dictionary I want to iterate whose value contains another dictionary.

main_dict = {101: {1234: [11111,11111],5678: [44444,44444]},
102: {9100: [55555,55555],1112: [77777,88888]}}


I'm reading a csv file and storing contents in this dictionary. Like this :

Input.csv -

lineno,item,total
101,1234,11111
101,1234,11111
101,5678,44444
101,5678,44444
102,9100,55555
102,9100,55555
102,1112,77777
102,1112,88888


This is input csv file. I'm reading this csv file and I want to know for one unique item total is how many times repeating?

For that stuff I'm doing like this :

for line in reader:
if line[0] in main_dict:
if line[1] in main_dict[line[0]]:
main_dict[line[0]][line[1]].append(line[2])
else:
main_dict[line[0]].update({line[1]:[line[2]]})
else:
main_dict[line[0]] = {line[1]:[line[2]]}

print main_dict


Output of above program :

{101: {1234: [11111,11111],5678: [44444,44444]},
102: {9100: [55555,55555],1112: [77777,88888]}}


but I'm facing following error in this line-

if line[1] in main_dict[line[0]]:
IndexError: list index out of range


Iteration of main_dict-

for key,value in main_dict.iteritems():
f1 = open(outputfile + op_directory +'/'+ key+'.csv', 'w')
writer1 = csv.DictWriter(f1, delimiter=',', fieldnames = fieldname)
writer1.writeheader()
if type(value) == type({}):
for k,v in value.iteritems():
if type(v) == type([]):
set1 = set(v)
for se in set1:
writer1.writerow({'item':k,'total':se,'total_count':v.count(se)})


I want to know best way to iterate this type of dictionary?

Sometimes I'm getting correct result just like above dictionary but many a times I face this error, what is that I'm missing?

Thanks in advance!

Answer

As the comments pointed out, you are not checking if line is of length 3:

for line in reader:
    if not len(line) == 3:
        continue

Concerning your algorithm, I would use nested defaultdict to avoid the if/else lines.

EDIT: I added a new defaultdict and the csv writing part after the question edit:

from collections import defaultdict
import csv

counter = defaultdict(lambda: defaultdict(list))
main_dict= defaultdict(lambda: defaultdict(lambda: defaultdict(dict)))
fieldnames=['item', 'total', 'total_count']

# we suppose reader is a cvs.reader object
with open('input.csv', 'rb') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    for line in reader:
        if not len(line) == 3:
            continue
        # Remove unwanted spaces
        lineno, item, total = [el.strip() for el in line]
        # Do not deal with non digit entries (title for example)
        if not lineno.isdigit():
            continue
        counter[lineno][item].append(total)
        csvdict = {'item': item,
                   'total': total,
                   'total_count': counter[lineno][item].count(total)}
        main_dict[lineno][item][total].update(csvdict)

# The writing part
for lineno in sorted(main_dict):
    itemdict = main_dict[lineno]
    output = 'output_%s.csv' % lineno
    with open(output, 'wb') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames, delimiter=',')
        writer.writeheader()
        for totaldict in itemdict.values():
            for csvdict in totaldict.values():
                writer.writerow(csvdict)

You can then use the following function to print a readable representation of the result:

def myprint(obj, ntab=0):
    if isinstance(obj, (dict, defaultdict)):
        for k in sorted(obj):
            myprint('%s%s'%(ntab*' ', k), ntab+1)
            myprint(obj[k], ntab+1)
    else:
        print('%s%s'%(ntab*' ', obj))
myprint(main_dict)

But if you want to count the item totals, I would use another defaultdict with the total as the key and a tuple (lineno, item) as the value:

from collections import defaultdict
import csv

total_dict = defaultdict(list)

# we suppose reader is a cvs.reader object
with open('input.csv', 'rb') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    for line in reader:
        if not len(line) == 3:
            continue
        # Remove unwanted spaces
        lineno, item, total = [el.strip() for el in line]
        # Do not deal with non digit entries (title for example)
        if not lineno.isdigit():
            continue
        total_dict[total].append((lineno, item))

You can have the number of each total very easily:

>>> print len(total_dict['55555'])
2