Brian C. Brian C. - 8 months ago 50
Python Question

Using Python CSV DictReader to create multi-level nested dictionary

Total Python noob here, probably missing something obvious. I've searched everywhere and haven't found a solution yet, so I thought I'd ask for some help.

I'm trying to write a function that will build a nested dictionary from a large csv file. The input file is in the following format:

blue widget,5,4,sony,
red widget,6,5,sony,
green widget,7,5,microsoft,
purple widget,7,6,microsoft,


The output dictionary I need would look like:

projects = { `<Brand>`: { `<Product>`: { 'Price': `<Price>`, 'Cost': `<Cost>` },},}

But obviously with many different brands containing different products. In the input file, the data is ordered alphabetically by brand name, but I know that it becomes unordered as soon as DictReader executes, so I definitely need a better way to handle the duplicates. The if statement as written is redundant and unnecessary.

Here's the non-working, useless code I have so far:

def build_dict(source_file):
projects = {}
headers = ['Product', 'Price', 'Cost', 'Brand']
reader = csv.DictReader(open(source_file), fieldnames = headers, dialect = 'excel')
current_brand = 'None'
for row in reader:
if Brand != current_brand:
current_brand = Brand
projects[Brand] = {Product: {'Price': Price, 'Cost': Cost}}
return projects

source_file = 'merged.csv'
print build_dict(source_file)

I have of course imported the csv module at the top of the file.

What's the best way to do this? I feel like I'm way off course, but there is very little information available about creating nested dicts from a CSV, and the examples that are out there are highly specific and tend not to go into detail about why the solution actually works, so as someone new to Python, it's a little hard to draw conclusions.

Also, the input csv file doesn't normally have headers, but for the sake of trying to get a working version of this function, I manually inserted a header row. Ideally, there would be some code that assigns the headers.

Any help/direction/recommendation is much appreciated, thanks!

import csv
from collections import defaultdict

def build_dict(source_file):
    projects = defaultdict(dict)
    headers = ['Product', 'Price', 'Cost', 'Brand']
    with open(source_file, 'rb') as fp:
        reader = csv.DictReader(fp, fieldnames=headers, dialect='excel',
        for rowdict in reader:
            if None in rowdict:
                del rowdict[None]
            brand = rowdict.pop("Brand")
            product = rowdict.pop("Product")
            projects[brand][product] = rowdict
    return dict(projects)

source_file = 'merged.csv'
print build_dict(source_file)


{'microsoft': {'green widget': {'Cost': '5', 'Price': '7'},
               'purple widget': {'Cost': '6', 'Price': '7'}},
 'sony': {'blue widget': {'Cost': '4', 'Price': '5'},
          'red widget': {'Cost': '5', 'Price': '6'}}}

from your input data (where merged.csv doesn't have the headers, only the data.)

I used a defaultdict here, which is just like a dictionary but when you refer to a key that doesn't exist instead of raising an Exception it simply makes a default value, in this case a dict. Then I get out -- and remove -- Brand and Product, and store the remainder.

All that's left I think would be to turn the cost and price into numbers instead of strings.

[modified to use DictReader directly rather than reader]