view raw
Federico C Federico C - 8 months ago 57
Python Question

pandas: dataframe from dict with comma separated values

I'm trying to create a DataFrame from a nested dictionary, where the values are in comma separated strings.

Each value is nested in a dict, such as:

dict = {"1":{
"event":"A, B, C"},
"event":"D, B, A, C"},
"event":"D, B, C"}

My desired output is:

0 A B C NaN
1 A B C D
2 NaN B C D

All I have so far is converting the dict to dataframe and splitting the items in each list. But I'm not sure this is getting me any closer to my objective.

df = pd.DataFrame(dict)
1 2 3
event A, B, C D, B, A, C D, B, C

In [441]: df.loc['event'].str.split(',').apply(pd.Series)
0 1 2 3
1 A B C NaN
2 D B A C
3 D B C NaN

Any help is appreciated. Thanks


From what you have(modified the split a little to strip the extra spaces) df1, you can probably just stack the result and use pd.crosstab() on the index and value column:

df1 = df.loc['event'].str.split('\s*,\s*').apply(pd.Series) 

df2 = df1.stack().rename('value').reset_index()
pd.crosstab(df2.level_0, df2.value)

#   value   A   B   C   D
# level_0               
#       1   1   1   1   0
#       2   1   1   1   1
#       3   0   1   1   1

This is not exactly as you asked for, but I imagine you may prefer this to your desired output.

To get exactly what you are looking for, you can add an extra column which is equal to the value column above and then unstack the index that contains the values:

df2 = df1.stack().rename('value').reset_index()
df2['value2'] = df2.value
df2.set_index(['level_0', 'value']).drop('level_1', axis = 1).unstack(level = 1)

#         value2
#   value   A     B     C     D
# level_0               
#       1   A     B     C  None
#       2   A     B     C     D
#       3   None  B     C     D