wilbev wilbev - 1 month ago 8
Python Question

Import single csv file with one data field going to separate table

I have a csv file containing a list of

Products
with the following 4 fields

Product ID, Name, Alias, UOM


I would like to import the file into a database that has 2 tables. The first is the
Product
table:

Product ID, Name, UOM


The second is the
Product Alias
table:

Product ID, Alias


Each
Product ID
may have 0 to many alias names.

Is there any way I could treat the
Alias
column different by the fact it has a different separator between the commas, like ";" or a period "." to separate the 0 to many alias names for a given Product ID?

Therefore during the csv import, when it gets the 3rd comma, it would import that data into the 2nd table but import a new record with repeating Product ID's for as many alias names are in that comma field.

Hopefully I explained that well enough, let me if I didn't. I'm more interested in the possible processing of doing this regardless of what code that is being using but python would be the preferred route.

Example data

ProductID, Name, Alias, UOM
122, Widget1, W1; Wid1;Wt1, Each
123, Widget2, , Each
124, Widget3, W3; Wt3, Each

Answer

This is pretty easy to do.

First, you use csv as usual to get all the columns. One of those columns will itself be a semicolon-separated list of values.

If you don't need to worry about quoting, etc., you can just use split. For example:

with open('foo.csv') as f:
    for row in csv.DictReader(f):
        cursor.execute('''INSERT INTO Product (ProductID, Name, UOM) 
                          VALUES (:ProductID, :Name, :UOM)''', row)
        for alias in row['Alias'].split(';'):
            cursor.execute('''INSERT INTO ProductAlias (Name, Alias) 
                              VALUES (?, ?)''', row['Name'], alias)

If you can have spaces within the aliases, you're going to need two kinds of quoting or escaping, alongside the two kinds of delimiters. Like this:

123,"Widget 1","Widget1;W1;Wid1;Wt1;'W 1'",Each

And you'd want to parse it by using another csv.reader. Of course csv is intended to parse a sequence of lines into a sequence of rows, not a single line into a single row, but that's easy to deal with by just wrapping the single line in a list, and extracting the single row from the result. For example, instead of this:

row['Alias'].split(';')

… do this:

next(csv.reader([row['Alias']], delimiter=";", quotechar="'"))

(Although you'd probably want to split up the expression into multiple steps once it gets this complicated.)