Brian Postow Brian Postow - 3 months ago 21
Python Question

median of multiple files in pandas

I have several csv files that I'm loading into pandas. The contain all the same columns, and almost but not exactly the same indexes. The rows are indexed by a pair (segVar, val).

What I want is a new DataFrame in with the same columns, and the union of the indexes, and each row is the median of the appropriate rows from the other files.

I also need to keep the order of the rows the same. (The orders between the files will be consistent)

This is probably 2 questions: how best to get the union of the indexes, and how to get the medians. But if it can be done in one answer, that's great.

Answer

You can use pd.concat to combine the DataFrames and use groupby on the index:

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [2, 3, 5]}, index = [1, 2, 3])

df1
Out: 
   A  B
1  1  2
2  2  3
3  3  5

df2 = pd.DataFrame({'A': [4, 5, 2], 'B': [1, 6, 3]}, index = [2, 3, 5])

df2
Out: 
   A  B
2  4  1
3  5  6
5  2  3

df3 = pd.DataFrame({'A': [4, 3, 1], 'B': [3, 2, 5]}, index = [3, 4, 5])

df3
Out: 
   A  B
3  4  3
4  3  2
5  1  5

pd.concat([df1, df2, df3]).groupby(level=0).median()
Out: 
     A    B
1  1.0  2.0
2  3.0  2.0
3  4.0  5.0
4  3.0  2.0
5  1.5  4.0
Comments