user3059024 user3059024 - 15 days ago 4
Python Question

Adding extra entry in a multi-indexed pandas dataframe from another multi-indexed pandas dataframe

I have a multi-indexed pandas dataframe that I have used the

groupby
method followed by the
describe
method on to give me the following:

grouped= self.HK_data.groupby(level=[0,1])
summary= grouped.describe()


which gives:

Antibody Time
Customer_Col1A2 0 count 3.000000
mean 0.757589
std 0.188750
min 0.639933
25% 0.648732
50% 0.657532
75% 0.816417
max 0.975302
10 count 3.000000
mean 0.716279
std 0.061939
min 0.665601
25% 0.681757
50% 0.697913
75% 0.741618
max 0.785324
... .........


I have calculated the
SEM
using:

SEM=grouped.mean()/(numpy.sqrt(grouped.count()))


Giving:

Antibody Time
Customer_Col1A2 0 0.437394
10 0.413544
120 0.553361
180 0.502792
20 0.512797
240 0.514609
30 0.505618
300 0.481021
45 0.534658
5 0.425800
60 0.430633
90 0.525115
... .........


How do I
concat
these two frames such that the SEM's become another entry of the summary statistics?

So something like:

Antibody Time
Customer_Col1A2 0 count 3.000000
mean 0.757589
std 0.188750
min 0.639933
25% 0.648732
50% 0.657532
75% 0.816417
max 0.975302
SEM 0.437394
10 count 3.000000
mean 0.716279
std 0.061939
min 0.665601
25% 0.681757
50% 0.697913
75% 0.741618
max 0.785324
SEM 0.413544


I've tried
pandas.concat
but this didn't give me what I want.

Thanks!

Answer

I think you first add third level of MultiIndex , assign new index by MultiIndex.from_tuples and last use concat with sort_index:

HK_data = pd.DataFrame({'Antibody':['Customer_Col1A2','Customer_Col1A2','Customer_Col1A2'],
                   'Time':[0,10,10],
                   'Col':[7,8,9]})
HK_data = HK_data.set_index(['Antibody','Time'])
print (HK_data)
                      Col
Antibody        Time     
Customer_Col1A2 0       7
                10      8
                10      9
grouped= HK_data.groupby(level=[0,1])
summary= grouped.describe()
print (summary)
                                 Col
Antibody        Time                
Customer_Col1A2 0    count  1.000000
                     mean   7.000000
                     std         NaN
                     min    7.000000
                     25%    7.000000
                     50%    7.000000
                     75%    7.000000
                     max    7.000000
                10   count  2.000000
                     mean   8.500000
                     std    0.707107
                     min    8.000000
                     25%    8.250000
                     50%    8.500000
                     75%    8.750000
                     max    9.000000

SEM=grouped.mean()/(np.sqrt(grouped.count()))
#change multiindex
new_index = list(zip(SEM.index.get_level_values('Antibody'),
                     SEM.index.get_level_values('Time'), 
                     ['SEM'] * len(SEM.index)))
SEM.index = pd.MultiIndex.from_tuples(new_index, names=('Antibody','Time', None))

print (SEM)
                               Col
Antibody        Time              
Customer_Col1A2 0    SEM  7.000000
                10   SEM  6.010408
df = pd.concat([summary, SEM]).sort_index()
print (df)
                                 Col
Antibody        Time                
Customer_Col1A2 0    25%    7.000000
                     50%    7.000000
                     75%    7.000000
                     SEM    7.000000
                     count  1.000000
                     max    7.000000
                     mean   7.000000
                     min    7.000000
                     std         NaN
                10   25%    8.250000
                     50%    8.500000
                     75%    8.750000
                     SEM    6.010408
                     count  2.000000
                     max    9.000000
                     mean   8.500000
                     min    8.000000
                     std    0.707107
Comments