VM1 VM1 - 10 months ago 141
Python Question

pandas.concat of multiple data frames using only common columns

I have multiple pandas data frame objects cost1, cost2 , cost3 ....

  1. They have different column names (and number of columns) but have some in common.

  2. Number of columns are fairly large in each dataframe, hence handpicking the common columns manually will be painful.

How can I append rows from all of these data frames into one single data frame
while retaining elements from only the common column names ?

As of now I have


new_combined = pd.concat( frames,ignore_index=True)

This obviously contains columns which are not common across all data frames.

Answer Source

You can find the common columns with Python's set.intersection:

common_cols = list(set.intersection(*(set(df.columns) for df in frames)))

To concatenate using only the common columns, you can use

pd.concat([df[common_cols] for df in frames], ignore_index=True)