user2676049 user2676049 - 1 month ago 14
Python Question

Building nested dataframe or dictionary from csv file for pyupset

I have a csv file in the following format,

Type_A, Type_B, Type_C

I need to build a nested dictionary or dataframe in the following format

ID Value
1 x1
2 y1
ID Value
1 x2
2 y2
ID Value
1 x3
2 y3

In my first attempt, this is what I did

import pandas as pd

Type_A Type_B Type_C
0 x1 x2 x3
1 y1 y2 y3

I though all I need to do is read each column in a list and then add index before combining it in a dictionary or nested dataframe

so this is what I did to store them in a list.

import pandas as pd

then to add index used enumerate ()

d1_df=list(enumerate(d1, 1))
d2_df=list(enumerate(d2, 1))
d3_df=list(enumerate(d3, 1))

d1_df # this gives me [(1, 'x1'), (2, 'y1')]

Now next I added lables Id and Value to dataframe

labels = ['Id','Value']
d1_df = pd.DataFrame.from_records(d1_df, columns=labels)
d2_df = pd.DataFrame.from_records(d2_df, columns=labels)
d3_df = pd.DataFrame.from_records(d3_df, columns=labels)

d1_df # this gives me Id Value
# 0 1 x1
# 1 2 y1

Now I need to somehow nest them together in a dictionary

Answer Source

First, instead of your

d1_dict=list(enumerate(d1, 1))
d2_dict=list(enumerate(d2, 1))
d3_dict=list(enumerate(d3, 1))


lst    = 3 * [None]            # Creating a list with 3 temporary items
lst[0] = list(enumerate(d1, 1))
lst[1] = list(enumerate(d2, 1))
lst[2] = list(enumerate(d3, 1))

(i.e. the list of lists; I chose the more appropriate name lst instead of your dx_dict).

Then create your nested dictionary:

result = {}
for col, id, lt in zip(df.columns, range(0, 3), lst):
    result[col] = {'ID': id}  # Creating column_name key and assigning inner dict to it
    result[col].update(lt)    # Adding key:value pairs from the list of pairs to inner dict

The value of the result will be

{'Type_A': {'ID': 0, 1: 'x1', 2: 'y1'},
'Type_B': {'ID': 1, 1: ' x2', 2: ' y2'},
'Type_C': {'ID': 2, 1: ' x3', 2: ' y3'}}

(I didn't know which value you want for the ID key, so I used range(0, 3) for producing the sequential numbers, and I used numbered x, y in my CSV file to differentiate between columns.)