Jinstrong Jinstrong - 5 months ago 13
Python Question

save dataframe to excel when each value is a dict

Currently I am working on a project to read multiple excel sheet, parse the data, and save the combined dataframe back to an excel file.

I read multiple excel table to dataframe, transform each to nested dict, and then combine all to a big nested dict (3-dimension of data), and at last transform back to dataframe.

I have finished combining the data to a big dataframe. However, each value of the dataframe is a dict, thus after saving to excel using to_excel, the format is not good(as you can see below). What I want is to re-organize the data, so each excel cell only contains a key, an index item, or a single value.

Here is the code I used to save data to excel:

with ExcelWriter(dump_excel) as writer:
DataFrame(DataFrame(new_dict)).to_excel(writer, sheet_name='yield_all')


enter image description here

What I would want to get is something like below:
enter image description here

Answer

IIUC you can remove dict with DataFrame constructor and list comprehension:

print (df)
                                             BIN2_R1
0  {0.0: 23, 1.0: 31, 'yield': '13.01%', 'total':...
1  {0.0: 81, 1.0: 70, 'yield': '36.01%', 'total':...

print (pd.DataFrame([x for x in df.BIN2_R1]))
   0.0  1.0   yield  total
0   23   31  13.01%     54
1   81   70  36.01%    151

EDIT:

You can use concat:

df1 = pd.concat([(pd.DataFrame([x for x in df.BIN2_R1])),
                 (pd.DataFrame([x for x in df.FT]))], axis=1, keys=['BIN2_R1','FT'])

print (df1)
  BIN2_R1                    FT                    
      0.0 1.0   yield total 0.0   1.0   yield total
0      23  31  13.01%    54  82  6517  92.70%  6599
1      81  70  36.01%   151  51   173   0.53%    13

df1.to_excel('test.xlsx')

More general solution, if all columns contain dictionaries:

dfs = [pd.DataFrame([x for x in df[col]]) for col in df.columns]
df1 = pd.concat(dfs, axis=1, keys=df.columns)
print (df1)

  BIN2_R1                    FT                    
      0.0 1.0   yield total 0.0   1.0   yield total
0      23  31  13.01%    54  82  6517  92.70%  6599
1      81  70  36.01%   151  51   173   0.53%    13

df1.to_excel('test.xlsx')

EDIT:

The main problem is that inner dict are not dict, but string. So I have to convert them. Converting is impossible with NaN, so I fillna by {}:

import pandas as pd
import ast


d = {1: {u'FT1': u"{0.0: 19732, 1.0: 20495, 'total': 40227, 'yield': '93.34%'}", u'FT3': u"{0.0: 9285, 1.0: 9629, 'total': 18914, 'yield': '92.93%'}", u'FT2': u"{0.0: 1412, 1.0: 1480, 'total': 2892, 'yield': '93.87%'}", u'FT': u"{0.0: 82, 1.0: 6517, 'total': 6599, 'yield': '92.70%'}", u'FT_R1': u"{0.0: 1262, 1.0: 1418, 'total': 2680, 'yield': '53.73%'}", u'QA_R2': u"{0.0: 2, 'total': 2, 'yield': '100.00%'}", u'QA_R1': u"{0.0: 6, 'total': 6, 'yield': '75.00%'}", u'QA': u"{1.0: 750, 'total': 750, 'yield': '98.94%'}", u'BIN2_R1': u"{0.0: 23, 1.0: 31, 'total': 54, 'yield': '13.01%'}"}, 2: {u'FT1': u"{0.0: 246, 1.0: 110, 'total': 356, 'yield': '0.83%'}", u'FT3': u"{0.0: 81, 1.0: 54, 'total': 135, 'yield': '0.66%'}", u'FT2': u"{0.0: 9, 1.0: 3, 'total': 12, 'yield': '0.39%'}", u'FT': u"{0.0: 51, 1.0: 173, 'total': 224, 'yield': '3.15%'}", u'FT_R1': u"{0.0: 138, 1.0: 86, 'total': 224, 'yield': '4.49%'}", u'QA_R1': u"{0.0: 1, 'total': 1, 'yield': '12.50%'}", u'QA': u"{1.0: 5, 'total': 5, 'yield': '0.66%'}", u'BIN2_R1': u"{0.0: 81, 1.0: 70, 'total': 151, 'yield': '36.39%'}"}, 3: {u'FT1': u"{0.0: 72, 1.0: 47, 'total': 119, 'yield': '0.28%'}", u'FT3': u"{0.0: 35, 1.0: 25, 'total': 60, 'yield': '0.29%'}", u'FT2': u"{0.0: 1, 1.0: 1, 'total': 2, 'yield': '0.06%'}", u'FT': u"{0.0: 0, 1.0: 13, 'total': 13, 'yield': '0.18%'}", u'FT_R1': u"{0.0: 93, 1.0: 98, 'total': 191, 'yield': '3.83%'}", u'BIN2_R1': u"{0.0: 92, 1.0: 97, 'total': 189, 'yield': '45.54%'}"}, 4: {u'FT1': u"{0.0: 132, 1.0: 174, 'total': 306, 'yield': '0.71%'}", u'FT3': u"{0.0: 35, 1.0: 36, 'total': 71, 'yield': '0.35%'}", u'FT2': u"{0.0: 8, 1.0: 11, 'total': 19, 'yield': '0.62%'}", u'FT': u"{0.0: 1, 1.0: 37, 'total': 38, 'yield': '0.53%'}", u'FT_R1': u"{0.0: 179, 1.0: 167, 'total': 346, 'yield': '6.94%'}", u'BIN2_R1': u"{0.0: 1, 1.0: 1, 'total': 2, 'yield': '0.48%'}"}, 5: {u'FT1': u"{0.0: 27, 1.0: 28, 'total': 55, 'yield': '0.13%'}", u'FT_R1': u"{0.0: 41, 1.0: 34, 'total': 75, 'yield': '1.50%'}", u'FT3': u"{0.0: 11, 1.0: 9, 'total': 20, 'yield': '0.10%'}", u'FT2': u"{0.0: 2, 1.0: 0, 'total': 2, 'yield': '0.06%'}", u'FT': u"{0.0: 0, 1.0: 4, 'total': 4, 'yield': '0.06%'}"}, 8: {u'FT1': u"{0.0: 76, 1.0: 77, 'total': 153, 'yield': '0.35%'}", u'FT3': u"{0.0: 40, 1.0: 42, 'total': 82, 'yield': '0.40%'}", u'FT2': u"{0.0: 5, 1.0: 8, 'total': 13, 'yield': '0.42%'}", u'FT': u"{0.0: 0, 1.0: 20, 'total': 20, 'yield': '0.28%'}", u'FT_R1': u"{0.0: 131, 1.0: 133, 'total': 264, 'yield': '5.29%'}", u'BIN2_R1': u"{0.0: 1, 1.0: 2, 'total': 3, 'yield': '0.72%'}"}, 9: {u'FT1': u"{0.0: 199, 1.0: 158, 'total': 357, 'yield': '0.83%'}", u'FT3': u"{0.0: 90, 1.0: 62, 'total': 152, 'yield': '0.75%'}", u'FT2': u"{0.0: 8, 1.0: 8, 'total': 16, 'yield': '0.52%'}", u'FT': u"{0.0: 0, 1.0: 36, 'total': 36, 'yield': '0.51%'}", u'FT_R1': u"{0.0: 238, 1.0: 238, 'total': 476, 'yield': '9.54%'}", u'BIN2_R1': u"{0.0: 2, 1.0: 2, 'total': 4, 'yield': '0.96%'}"}, 10: {u'FT1': u"{0.0: 56, 1.0: 38, 'total': 94, 'yield': '0.22%'}", u'FT3': u"{0.0: 25, 1.0: 33, 'total': 58, 'yield': '0.28%'}", u'FT2': u"{0.0: 5, 1.0: 1, 'total': 6, 'yield': '0.19%'}", u'FT': u"{0.0: 0, 1.0: 11, 'total': 11, 'yield': '0.15%'}", u'FT_R1': u"{0.0: 77, 1.0: 66, 'total': 143, 'yield': '2.87%'}", u'BIN2_R1': u"{0.0: 2, 1.0: 0, 'total': 2, 'yield': '0.48%'}"}, 11: {u'FT1': u"{0.0: 2, 1.0: 0, 'total': 2, 'yield': '0.00%'}", u'FT3': u"{0.0: 1, 1.0: 0, 'total': 1, 'yield': '0.00%'}", u'BIN2_R1': u"{0.0: 1, 1.0: 0, 'total': 1, 'yield': '0.24%'}"}, 12: {u'FT1': u"{0.0: 6, 1.0: 0, 'total': 6, 'yield': '0.01%'}", u'FT3': u"{0.0: 2, 1.0: 0, 'total': 2, 'yield': '0.01%'}", u'FT_R1': u"{0.0: 1, 1.0: 0, 'total': 1, 'yield': '0.02%'}"}, 13: {u'FT1': u"{0.0: 953, 1.0: 422, 'total': 1375, 'yield': '3.19%'}", u'FT3': u"{0.0: 544, 1.0: 292, 'total': 836, 'yield': '4.11%'}", u'FT2': u"{0.0: 88, 1.0: 28, 'total': 116, 'yield': '3.77%'}", u'FT': u"{0.0: 21, 1.0: 147, 'total': 168, 'yield': '2.36%'}", u'FT_R1': u"{0.0: 289, 1.0: 225, 'total': 514, 'yield': '10.30%'}", u'QA_R1': u"{0.0: 1, 'total': 1, 'yield': '12.50%'}", u'QA': u"{1.0: 3, 'total': 3, 'yield': '0.40%'}", u'BIN2_R1': u"{0.0: 4, 1.0: 5, 'total': 9, 'yield': '2.17%'}"}, 14: {u'FT1': u"{0.0: 31, 1.0: 18, 'total': 49, 'yield': '0.11%'}", u'FT_R1': u"{0.0: 35, 1.0: 39, 'total': 74, 'yield': '1.48%'}", u'FT3': u"{0.0: 16, 1.0: 7, 'total': 23, 'yield': '0.11%'}", u'FT2': u"{0.0: 2, 1.0: 1, 'total': 3, 'yield': '0.10%'}", u'FT': u"{0.0: 0, 1.0: 6, 'total': 6, 'yield': '0.08%'}"}}
df = pd.DataFrame.from_dict(d, orient='index')
#print (df)

df = df.fillna('{}')

for col in df.columns:
    df[col] =  df[col].map(lambda d : ast.literal_eval(d))

#print (df)

dfs = [pd.DataFrame([x for x in df[col]], index=df.index) for col in df.columns]

df1 = pd.concat(dfs, axis=1, keys=df.columns)
print (df1)

      FT                           FT_R1                             QA  \
     0.0     1.0   total   yield     0.0     1.0   total   yield    1.0   
1   82.0  6517.0  6599.0  92.70%  1262.0  1418.0  2680.0  53.73%  750.0   
2   51.0   173.0   224.0   3.15%   138.0    86.0   224.0   4.49%    5.0   
3    0.0    13.0    13.0   0.18%    93.0    98.0   191.0   3.83%    NaN   
4    1.0    37.0    38.0   0.53%   179.0   167.0   346.0   6.94%    NaN   
5    0.0     4.0     4.0   0.06%    41.0    34.0    75.0   1.50%    NaN   
8    0.0    20.0    20.0   0.28%   131.0   133.0   264.0   5.29%    NaN   
9    0.0    36.0    36.0   0.51%   238.0   238.0   476.0   9.54%    NaN   
10   0.0    11.0    11.0   0.15%    77.0    66.0   143.0   2.87%    NaN   
11   NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN    NaN   
12   NaN     NaN     NaN     NaN     1.0     0.0     1.0   0.02%    NaN   
13  21.0   147.0   168.0   2.36%   289.0   225.0   514.0  10.30%    3.0   
14   0.0     6.0     6.0   0.08%    35.0    39.0    74.0   1.48%    NaN   

            ...   BIN2_R1           FT3                         FT1         \
    total   ...     total   yield   0.0   1.0  total   yield    0.0    1.0   
1   750.0   ...      54.0  13.01%  9285  9629  18914  92.93%  19732  20495   
2     5.0   ...     151.0  36.39%    81    54    135   0.66%    246    110   
3     NaN   ...     189.0  45.54%    35    25     60   0.29%     72     47   
4     NaN   ...       2.0   0.48%    35    36     71   0.35%    132    174   
5     NaN   ...       NaN     NaN    11     9     20   0.10%     27     28   
8     NaN   ...       3.0   0.72%    40    42     82   0.40%     76     77   
9     NaN   ...       4.0   0.96%    90    62    152   0.75%    199    158   
10    NaN   ...       2.0   0.48%    25    33     58   0.28%     56     38   
11    NaN   ...       1.0   0.24%     1     0      1   0.00%      2      0   
12    NaN   ...       NaN     NaN     2     0      2   0.01%      6      0   
13    3.0   ...       9.0   2.17%   544   292    836   4.11%    953    422   
14    NaN   ...       NaN     NaN    16     7     23   0.11%     31     18   


    total   yield  
1   40227  93.34%  
2     356   0.83%  
3     119   0.28%  
4     306   0.71%  
5      55   0.13%  
8     153   0.35%  
9     357   0.83%  
10     94   0.22%  
11      2   0.00%  
12      6   0.01%  
13   1375   3.19%  
14     49   0.11%