Federico C - 1 year ago 89

Python Question

I'm trying to create a DataFrame from a nested dictionary, where the values are in comma separated strings.

Each value is nested in a dict, such as:

`dict = {"1":{`

"event":"A, B, C"},

"2":{

"event":"D, B, A, C"},

"3":{

"event":"D, B, C"}

}

My desired output is:

`A B C D`

0 A B C NaN

1 A B C D

2 NaN B C D

All I have so far is converting the dict to dataframe and splitting the items in each list. But I'm not sure this is getting me any closer to my objective.

`df = pd.DataFrame(dict)`

Out[439]:

1 2 3

event A, B, C D, B, A, C D, B, C

In [441]: df.loc['event'].str.split(',').apply(pd.Series)

Out[441]:

0 1 2 3

1 A B C NaN

2 D B A C

3 D B C NaN

Any help is appreciated. Thanks

Answer Source

From what you have(modified the split a little to strip the extra spaces) `df1`

, you can probably just `stack`

the result and use `pd.crosstab()`

on the index and value column:

```
df1 = df.loc['event'].str.split('\s*,\s*').apply(pd.Series)
df2 = df1.stack().rename('value').reset_index()
pd.crosstab(df2.level_0, df2.value)
# value A B C D
# level_0
# 1 1 1 1 0
# 2 1 1 1 1
# 3 0 1 1 1
```

This is not exactly as you asked for, but I imagine you may prefer this to your desired output.

To get exactly what you are looking for, you can add an extra column which is equal to the value column above and then unstack the index that contains the values:

```
df2 = df1.stack().rename('value').reset_index()
df2['value2'] = df2.value
df2.set_index(['level_0', 'value']).drop('level_1', axis = 1).unstack(level = 1)
# value2
# value A B C D
# level_0
# 1 A B C None
# 2 A B C D
# 3 None B C D
```