user1911092 user1911092 - 1 year ago 50
Python Question

Pandas Merge - How to avoid duplicating columns

I am attempting a merge between two data frames. Each data frame has two index levels (date, cusip). In the columns, some columns match between the two (currency, adj date) for example.

What is the best way to merge these by index, but to not take two copies of currency and adj date.

Each data frame is 90 columns, so I am trying to avoid writing everything out by hand.

df: currency adj_date data_col1 ...
date cusip
2012-01-01 XSDP USD 2012-01-03 0.45

df2: currency adj_date data_col2 ...
date cusip
2012-01-01 XSDP USD 2012-01-03 0.45

If I do:

dfNew = merge(df, df2, left_index=True, right_index=True, how='outer')

I get

dfNew: currency_x adj_date_x data_col2 ... currency_y adj_date_y
date cusip
2012-01-01 XSDP USD 2012-01-03 0.45 USD 2012-01-03

Thank you!

Answer Source

You can work out the columns that are only in one dataframe and use this to select a subset of columns in the merge

cols_to_use = df2.columns - df.columns

then perform the merge using this (note this is an index object but it has a handy tolist() method)

dfNew = merge(df, df2[cols_to_use], left_index=True, right_index=True, how='outer')

This will avoid any columns clashing in the merge

For version 0.15

cols_to_use = df2.columns.difference(df.columns)

thanks @odedbd