Chris Parry Chris Parry - 1 month ago 10
Python Question

Averaging pandas dataframes from certain columns

I am new to pandas. I have several

dfs
. The data in column
0
is the
ID
and in columns
1-10
are probabilities. I want to take the column-wise average of columns
1-10
across the
dfs
. The rows may not be in the same order.

Is there a better way to do it than sort each df on
ID
and then use the add/divide df functions? Any help appreciated.

Thanks very much for your comments. To clarify, I need to average the 2 dfs element wise. I.e. (just showing 1 row of each df):

Df1: id132456, 1, 2, 3, 4
Df2: id132456, 2, 2, 3, 2
Averaged: id132456, 1.5, 2, 3, 3

Answer

It looks like need concat and mean:

import pandas as pd

df1 = pd.DataFrame({0:[14254,25445,34555],
                   1:[1,2,3],
                   2:[1,1,1],
                   3:[1,2,0]})

print (df1)
       0  1  2  3
0  14254  1  1  1
1  25445  2  1  2
2  34555  3  1  0

df2 = pd.DataFrame({0:[14254,25445,34555],
                    2:[1,0,0],
                    1:[1,0,1],
                    3:[1,2,0]})

print (df2)
       0  1  2  3
0  14254  1  1  1
1  25445  0  0  2
2  34555  1  0  0
#list of all DataFrames
dfs = [df1, df2]
print (pd.concat(dfs, ignore_index=True))
       0  1  2  3
0  14254  1  1  1
1  25445  2  1  2
2  34555  3  1  0
3  14254  1  1  1
4  25445  0  0  2
5  34555  1  0  0

#select all columns without first
print (pd.concat(dfs, ignore_index=True).ix[:,1:])
   1  2  3
0  1  1  1
1  2  1  2
2  3  1  0
3  1  1  1
4  0  0  2
5  1  0  0

I am not sure what kind of mean need, so I add both:

#mean per rows
print (pd.concat(dfs, ignore_index=True).ix[:,1:].mean(1))
0    1.000000
1    1.666667
2    1.333333
3    1.000000
4    0.666667
5    0.333333
dtype: float64

#mean per columns
print (pd.concat(dfs, ignore_index=True).ix[:,1:].mean())
1    1.333333
2    0.666667
3    1.000000
dtype: float64

Maybe you need something else:

dfs = [df1.set_index(0), df2.set_index(0)]
print (pd.concat(dfs, ignore_index=True, axis=1))
       0  1  2  3  4  5
0                      
14254  1  1  1  1  1  1
25445  2  1  2  0  0  2
34555  3  1  0  1  0  0

print (pd.concat(dfs, ignore_index=True, axis=1).mean(1))
0
14254    1.000000
25445    1.166667
34555    0.833333
dtype: float64

print (pd.concat(dfs, ignore_index=True, axis=1).mean())
0    2.000000
1    1.000000
2    1.000000
3    0.666667
4    0.333333
5    1.000000
dtype: float64