kakk11 kakk11 - 8 days ago 8
Python Question

dataframe from dictionary of lists

Is there a way to achieve following with purely pandas methods or is it actually more reasonable to rearrange the dictionary itself first?

Initial dictionary:



data_json = {'a':[{'aa':1,'bb':2,'cc':3},
{'aa':2,'bb':2,'cc':3},
{'aa':3,'bb':2,'cc':3}],
'b':[{'beta':22,'alpha':23,'gamma':24},
{'gamma':24,'beta':25,'alpha':26},
{'alpha':34,'beta':35,'gamma':36}]}


And I would like to get a dataframe where column names would be nested dictionary keys:

aa bb cc alpha beta gamma
1 1 2 3 23 22 24
2 2 2 3 26 25 24
3 3 2 3 34 35 36


Trying:

aaa = pd.DataFrame(data_json)
foo = lambda x: pd.Series([i for i in x.items()])
bbb=pd.concat([aaa['a'].apply(foo),aaa['b'].apply(foo)],axis=1)


Gives me

0 1 2 0 1 2
1 1 2 3 23 22 24
2 2 2 3 26 25 24
3 3 2 3 34 35 36


But now I'm stuck because the column names are duplicated [0,1,2,0,1,2] and I cannot use just the

bbb.rename(columns={0:'a',1:'b',...})


As I said I do not mind reordering the initial dictionary, but I'd like the whole thing be as clean as possible.

Answer

I would load both 'a' and 'b' separately and join them (merge them on index):

pd.DataFrame(data_json['a']).join(pd.DataFrame(data_json['b']))


   aa  bb  cc  alpha  beta  gamma
0   1   2   3     23    22     24
1   2   2   3     26    25     24
2   3   2   3     34    35     36

Another way with a loop in case you don't know how many data_json.keys() you have, then using pd.concat since it's more convenient with a list. Note that I'm using sorted(data_json) so I can get a before b:

list_df = []
for k in sorted(data_json):
    list_df.append(pd.DataFrame(data_json[k]))
pd.concat(list_df, axis=1)