Cleb Cleb - 5 months ago 23
Python Question

Subtracting dataframes with unequal numbers of rows

I have two dataframes like this

import pandas as pd
import numpy as np

np.random.seed(0)

df1 = pd.DataFrame(np.random.randint(10, size=(5, 4)), index=list('ABCDE'), columns=list('abcd'))
df2 = pd.DataFrame(np.random.randint(10, size=(2, 4)), index=list('CE'), columns=list('abcd'))

a b c d
A 5 0 3 3
B 7 9 3 5
C 2 4 7 6
D 8 8 1 6
E 7 7 8 1

a b c d
C 5 9 8 9
E 4 3 0 3


The index of
df2
is always a subset of the index of
df1
and the column names are identical.

I want to create a third dataframe
df3 = df1 - df2
. If one does that, one obtains

a b c d
A NaN NaN NaN NaN
B NaN NaN NaN NaN
C -3.0 -5.0 -1.0 -3.0
D NaN NaN NaN NaN
E 3.0 4.0 8.0 -2.0


I don't want the
NAs
in the ouput but the respective values of
df1
. Is there a smart way of using e.g.
fillna
with the values of
df1
in the rows not contained in
df2
?

A workaround would be to do the subtract only the required rows like:

sub_ind = df2.index
df3 = df1.copy()
df3.loc[sub_ind, :] = df1.loc[sub_ind, :] - df2.loc[sub_ind, :]


which gives me the desired output

a b c d
A 5 0 3 3
B 7 9 3 5
C -3 -5 -1 -3
D 8 8 1 6
E 3 4 8 -2


but maybe there is a more straightforward way of achieving this?

Answer Source

If you use the sub method instead of -, you can pass a fill value:

df1.sub(df2, fill_value=0)
Out: 
     a    b    c    d
A  5.0  0.0  3.0  3.0
B  7.0  9.0  3.0  5.0
C -3.0 -5.0 -1.0 -3.0
D  8.0  8.0  1.0  6.0
E  3.0  4.0  8.0 -2.0