neuron neuron - 1 year ago 51
Python Question

Convert flat CSV to JSON when some fields need to be nested

I have a flat csv file with 50 columns (let's call them FirstName, LastName, Address, etc.), that's tab-delimited with quotes around all fields.

I need to convert this to a JSON file, but what's tricky is that some of the CSV columns need to be converted into nested fields, where the nested fields also contains, apart from the column's row value, certain generic fields and values (this is for an API that has required fields). I know in advance which of the columns need to become nested fields.

So, for simplicity's sake, let's say this is the structure of the first row for the first 3 columns in the CSV file:

FirstName LastName Address
John Doe 21 Python Street

This is the desired JSON output:

"FirstName": "John",
"LastName": "Doe",
"Shipping Details": [
"Generic Field 1": "Generic Value 1",
"Generic Field 2": "Generic Value 2",
"Address": "21 Python Street"

In the full CSV with 50 columns, I also have a few more columns that I need to parse a nested fields, with these extra generic values.

How can I go about doing this?

Answer Source

Use a DictReader and manipulate the row by adding the Shipping Details and removing the Address.

j = []

with open("/tmp/so.csv") as f:
    reader = csv.DictReader(f, delimiter="\t")
    for row in reader:
        # Add 'Shipping Details'  to row.
        # Note that something like this will have to be done
        # for *every* column you want to change.
        row["Shipping Details"] = {
          "Generic Field 1": "Generic Value 1",
          "Generic Field 2": "Generic Value 2",
          "Address": row["Address"]}

        # We don't need the 'Address' anymore.

        # Collect the changed row in the list of rows.

Output (after lint):

    "LastName": "Doe",
    "Shipping Details": {
        "Address": "21 Python Street",
        "Generic Field 1": "Generic Value 1",
        "Generic Field 2": "Generic Value 2"
    "FirstName": "John"