Andreas - 7 months ago 51

Python Question

I am working on a large multiIndex Dataframe that contains several indices e.g.

`segment`

`period`

`classification`

`Results1`

`Results2`

`consolidated_df`

`import pandas as pd`

import numpy as np

segments = ['A', 'B', 'C']

periods = [1, 2]

classification = ['x', 'y']

index_constr = pd.MultiIndex.from_product(

[segments, periods, classification],

names=['Segment', 'Period', 'Classification'])

consolidated_df = pd.DataFrame(np.nan, index=index_constr,

columns=['Results1', 'Results2'])

print(consolidated_df)

The structure (of the large DataFrame) is as follows:

`Results1 Results2`

Segment Period Classification

A 1 x NaN NaN

y NaN NaN

2 x NaN NaN

y NaN NaN

B 1 x NaN NaN

y NaN NaN

2 x NaN NaN

y NaN NaN

C 1 x NaN NaN

y NaN NaN

2 x NaN NaN

y NaN NaN

I am running a for loop over all my

`segments`

`A`

`B`

`C`

`calc_function`

This function returns a DataFrame that has the exact same format as the consolidated DataFrame - except that it just reports one segment at a time (i.e. it is a slice of the consolidated DataFrame).

Example:

`index_result = pd.MultiIndex.from_product(`

['A', periods, classification],

names=['Segment', 'Period', 'Classification'])

result_calc = pd.DataFrame(np.random.randn(4,2), index=index_result,

columns=['Results1', 'Results2'])

print(result_calc)

Results1 Results2

Segment Period Classification

A 1 x -1.568351 0.386250

y 0.679170 1.552551

2 x -1.190928 -0.765319

y 3.254929 1.436295

I tried using the below approach to store the results DataFrame in the consolidated one, but did not succeed:

`for segment in segments:`

#calc_function returns a DataFrame that has the same structure as consolidated_df

consolidated_df.loc[idx[segment, :, :], :] = calc_function(segment)

Is there a way to easily integrate the smaller DataFrame into the consolidated one?

Answer

Using your example above, how about just `consolidated_df.ix['A'] = result_calc`

?

(That's the same as `consolidated_df.ix['A', :, :] = result_calc`

)

```
print(consolidated_df)
Results1 Results2
Segment Period Classification
A 1 x 1.290466 0.228978
y -0.276959 0.735192
2 x 0.757339 -0.787502
y -0.609848 0.805773
B 1 x NaN NaN
y NaN NaN
2 x NaN NaN
y NaN NaN
C 1 x NaN NaN
y NaN NaN
2 x NaN NaN
y NaN NaN
```

Source (Stackoverflow)