embeepea - 1 year ago 141

Python Question

I am looking for an elegant way to append all the rows from one DataFrame to another DataFrame (both DataFrames having the same index and column structure), but in cases where the same index value appears in both DataFrames, use the row from the second data frame.

So, for example, if I start with:

`df1:`

A B

date

'2015-10-01' 'A1' 'B1'

'2015-10-02' 'A2' 'B2'

'2015-10-03' 'A3' 'B3'

df2:

date A B

'2015-10-02' 'a1' 'b1'

'2015-10-03' 'a2' 'b2'

'2015-10-04' 'a3' 'b3'

I would like the result to be:

`A B`

date

'2015-10-01' 'A1' 'B1'

'2015-10-02' 'a1' 'b1'

'2015-10-03' 'a2' 'b2'

'2015-10-04' 'a3' 'b3'

This is analogous to what I think is called "upsert" in some SQL systems --- a combination of update and insert, in the sense that each row from

`df2`

`df1`

`df1`

`df1`

I have come up with the following

`pd.concat([df1, df2]) # concat the two DataFrames`

.reset_index() # turn 'date' into a regular column

.groupby('date') # group rows by values in the 'date' column

.tail(1) # take the last row in each group

.set_index('date') # restore 'date' as the index

which seems to work, but this relies on the order of the rows in each groupby group always being the same as the original DataFrames, which I haven't checked on, and seems displeasingly convoluted.

Does anyone have any ideas for a more straightforward solution?

Answer Source

One solution is to conatenate `df1`

with new rows in `df2`

(i.e. where the index does not match). Then update the values with those from `df2`

.

```
df = pd.concat([df1, df2[~df2.index.isin(df1.index)]])
df.update(df2)
>>> df
A B
2015-10-01 A1 B1
2015-10-02 a1 b1
2015-10-03 a2 b2
2015-10-04 a3 b3
```

**EDIT:**
Per the suggestion of @chrisb, this can further be simplified as follows:

```
pd.concat([df1[~df1.index.isin(df2.index)], df2])
```

Thanks Chris!