Andreas Andreas - 1 month ago 18
Python Question

DataFrame combination

I am working on a large multiIndex Dataframe that contains several indices e.g.

segment
,
period
, and
classification
as well as several columns with results e.g.
Results1
,
Results2
. The DataFrame
consolidated_df
is supposed to store all of my calculation results:

import pandas as pd
import numpy as np

segments = ['A', 'B', 'C']
periods = [1, 2]
classification = ['x', 'y']

index_constr = pd.MultiIndex.from_product(
[segments, periods, classification],
names=['Segment', 'Period', 'Classification'])

consolidated_df = pd.DataFrame(np.nan, index=index_constr,
columns=['Results1', 'Results2'])

print(consolidated_df)


The structure (of the large DataFrame) is as follows:

Results1 Results2
Segment Period Classification
A 1 x NaN NaN
y NaN NaN
2 x NaN NaN
y NaN NaN
B 1 x NaN NaN
y NaN NaN
2 x NaN NaN
y NaN NaN
C 1 x NaN NaN
y NaN NaN
2 x NaN NaN
y NaN NaN


I am running a for loop over all my
segments
(
A
,
B
and
C
) to calculate the results (which are stored in the columns of the DataFrame using a separate function
calc_function
.
This function returns a DataFrame that has the exact same format as the consolidated DataFrame - except that it just reports one segment at a time (i.e. it is a slice of the consolidated DataFrame).

Example:

index_result = pd.MultiIndex.from_product(
['A', periods, classification],
names=['Segment', 'Period', 'Classification'])

result_calc = pd.DataFrame(np.random.randn(4,2), index=index_result,
columns=['Results1', 'Results2'])

print(result_calc)

Results1 Results2
Segment Period Classification
A 1 x -1.568351 0.386250
y 0.679170 1.552551
2 x -1.190928 -0.765319
y 3.254929 1.436295


I tried using the below approach to store the results DataFrame in the consolidated one, but did not succeed:

for segment in segments:
#calc_function returns a DataFrame that has the same structure as consolidated_df
consolidated_df.loc[idx[segment, :, :], :] = calc_function(segment)


Is there a way to easily integrate the smaller DataFrame into the consolidated one?

Answer

Using your example above, how about just consolidated_df.ix['A'] = result_calc?

(That's the same as consolidated_df.ix['A', :, :] = result_calc)

print(consolidated_df)

                               Results1  Results2
Segment Period Classification                    
A       1      x               1.290466  0.228978
               y              -0.276959  0.735192
        2      x               0.757339 -0.787502
               y              -0.609848  0.805773
B       1      x                    NaN       NaN
               y                    NaN       NaN
        2      x                    NaN       NaN
               y                    NaN       NaN
C       1      x                    NaN       NaN
               y                    NaN       NaN
        2      x                    NaN       NaN
               y                    NaN       NaN