tktktk0711 tktktk0711 - 3 months ago 11
Python Question

python pandas: merge two data frame but didn't merge the repeat rows

I have two dataframe: df1 and df2.

df1 is following:

name exist
a 1
b 1
c 1
d 1
e 1


df2 (just have one column:name)is following:

name
e
f
g
a
h


I want to merge these two dataframe, and didn't merge repeat names, I mean, if the name in df2 exist in df1, just show one time, else if the name is df2 not exist in df1, set the exist value is 0 or Nan. for example as df1(there is a and e), and df2(there is a and e, just showed a, e one time), I want to be the following df:

a 1
b 1
c 1
d 1
e 1
f 0
g 0
h 0


I used the concat function to do it, my code is following:

import pandas as pd


df1 = pd.DataFrame({'name': ['a', 'b', 'c', 'd', 'e'],
'exist': ['1', '1', '1', '1', '1']})
df2 = pd.DataFrame({'name': ['e', 'f', 'g', 'h', 'a']})
df = pd.concat([df1, df2])
print(df)


but the result is wrong(name a and e is repeated to be showed):

exist name
0 1 a
1 1 b
2 1 c
3 1 d
4 1 e
0 NaN e
1 NaN f
2 NaN g
3 NaN h
4 NaN a


please give your hands, thanks in advance!

Answer

As indicated by your title, you can use merge instead of concat and specify how parameter as outer since you want to keep all records from df1 and df2 which defines an outer join:

import pandas as pd
pd.merge(df1, df2, on = 'name', how = 'outer').fillna(0)

# exist name
# 0   1    a
# 1   1    b
# 2   1    c
# 3   1    d
# 4   1    e
# 5   0    f
# 6   0    g
# 7   0    h