Keith Pham Keith Pham -4 years ago 118
Python Question

Compute Average/Mean across Dataframes in Python Pandas

I have a list of dataframes. Each dataframe was originally numerical data taken from which are all shaped identically with 21 rows and 5 columns. The first column is an index (index 0 to index 20). I want to compute the average (mean) values into a single dataframe. Then I want to export the dataframe to excel.

Here's a simplified version of my existing code:

#look to concatenate the dataframes together all at once
#dataFrameList is the given list of dataFrames
concatenatedDataframes = pd.concat(dataFrameList, axis = 1)

#grouping the dataframes by the index, which is the same across all of the dataframes
groupedByIndex = concatenatedDataframes.groupby(level = 0)

#take the mean
meanDataFrame = groupedByIndex.mean()

# Create a Pandas Excel writer using openpyxl as the engine.
writer = pd.ExcelWriter(filepath, engine='openpyxl')

However, when I open the excel file, I see what looks like EVERY dataframe is copied into the sheet and the average/mean values are not shown. A simplified example is shown below (cutting most of the rows and dataframes)

Dataframe 1 Dataframe 2 Dataframe 3
Index Col2 Col3 Col4 Col5 Col2 Col3 Col4 Col5 Col2 Col3 Col4 Col5
0 Data Data Data Data Data Data Data Data Data Data Data Data
1 Data Data Data Data Data Data Data Data Data Data Data Data
2 Data Data Data Data Data Data Data Data Data Data Data Data

I'm looking for something more like:

Averaged DF
Index Col2 Col3 Col4
0 Mean Index0,Col2 across DFs Mean Index0,Col3 across DFs Mean Index0,Col4 across DFs
1 Mean Index1,Col2 across DFs Mean Index1,Col3 across DFs Mean Index1,Col4 across DFs
2 Mean Index2,Col2 across DFs Mean Index2,Col3 across DFs Mean Index3,Col4 across DFs

I have also already seen this answer:
Get the mean across multiple Pandas DataFrames

If possible, I'm looking for a clean solution, not one which would simply involve looping through each dataFrame value by value. Any suggestions?

Answer Source

Perhaps I misunderstood what you asked

The solution is simple. You just need to concat along the correct axis

dummy data

df1 = pd.DataFrame(index=range(rows), columns=range(columns), data=[[10 + i * j for j in range(columns)] for i in range(rows) ]) df2 = df1 = pd.DataFrame(index=range(rows), columns=range(columns), data=[[i + j for j in range(columns)] for i in range(rows) ])

ps. this should be your job as OP


df_concat0 = pd.concat((df1, df2), axis=1)

puts all the dataframes next to eachother.

    0   1   0   1
0   10  10  0   1
1   10  11  1   2
2   10  12  2   3

If we want to do a groupby now, we first need to stack, groupby and stack again


    0   1
0   5.0     5.5
1   5.5     6.5
2   6.0     7.5

If we do

df_concat = pd.concat((df1, df2))

This puts all the dataframes on top of eachother

    0   1
0   10  10
1   10  11
2   10  12
0   0   1
1   1   2
2   2   3

now we need to just groupby the index, like you did


    0   1
0   5.0     5.5
1   5.5     6.5
2   6.0     7.5

and then use ExcelWriter as context manager

with pd.ExcelWriter(filepath, engine='openpyxl') as writer:

or just plain

result.to_excel(filepath, engine='openpyxl') 

if you can overwrite what is is filepath

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download