maschu maschu - 5 months ago 65
Python Question

Subtract pandas dataframes while leaving some columns intact

Suppose I have two dataframes:

data1 = """
idx, stat, val
1, 1, 5
2, 1, 10
3, 2, 15
4, 3, 20
"""
data2 = """
idx, stat, val
2, 1, 8
4, 5, 16
"""


(sorry - I don't know how to code this easily in pandas - I would read them in from csv in real life).

I would like to subtract these two dataframes so that only the val column is subtracted (in real life these are station coordinates). The result should be NaN if only one of the two dataframes has the respective index value (column idx in the ascii format). Also ok if non-matching rows are immediately removed (I would call dropna afterwards anyhow).

Hence, my desired result would be:

desired = """
idx, stat, val
1, 1, NaN
2, 1, 2
3, 2, NaN
4, 3, 4
"""


In principle
data1.sub(data2)
works nicely -- except that I cannot see a way how to "protect" the stat column from the subtraction. I also tried
data1.sub(data2['val'], axis=0)
, but this messed up the row matching which is automatically performed in
data1.sub(data2)
. A workaround is to re-substitute the stat column(s) afterwards with the original values from
data1
, but this seems rather clumsy.

Answer

You can use:

df1.set_index(['idx','stat'], inplace=True)
df2.set_index('idx', inplace=True)

print (df1.sub(df2[['val']]))
          val
idx stat     
1   1     NaN
2   1     2.0
3   2     NaN
4   3     4.0

print (df1.sub(df2[['val']]).reset_index())
   idx  stat  val
0    1     1  NaN
1    2     1  2.0
2    3     2  NaN
3    4     3  4.0

If idx are indexes in both df:

print (df1)
     stat  val
idx           
1       1    5
2       1   10
3       2   15
4       3   20

print (df2)
     stat  val
idx           
2       1    8
4       5   16

df1.set_index('stat', append=True, inplace=True)
print (df1.sub(df2[['val']]).reset_index())
   idx  stat  val
0    1     1  NaN
1    2     1  2.0
2    3     2  NaN
3    4     3  4.0