MEhsan MEhsan - 5 months ago 8
Python Question

How to Copy the Matching Columns between CSV Files Using Pandas?

I have two dataframes(f1_df and f2_df):

f1_df looks like:

ID,Name,Gender
1,Smith,M
2,John,M


f2_df looks like:

name,gender,city,id


Problem:

I want the code to compare the header of f1_df with f2_df by itself and copy the data of the matching columns using panda.

Output:

the output should be like this:

name,gender,city,id # name,gender,and id are the only matching columns btw f1_df and f2_df
Smith,M, ,1 # the data copied for name, gender, and id columns
John,M, ,2


I am new to Pandas and not sure how to handle the problem. I have tried to do an inner join to the matching columns, but that did not work.

Here is what I have so far:

import pandas as pd

f1_df = pd.read_csv("file1.csv")
f2_df = pd.read_csv("file2.csv")

for i in f1_df:
for j in f2_df:
i = i.lower()
if i == j:
joined = f1_df.join(f2_df)
print joined


Any idea how to solve this?

Answer

try this if you want to merge / join your DFs on common columns:

first lets convert all columns to lower case:

df1.columns = df1.columns.str.lower()
df2.columns = df2.columns.str.lower()

now we can join on common columns

common_cols = df2.columns.intersection(df1.columns).tolist()
joined = df1.set_index(common_cols).join(df2.set_index(common_cols)).reset_index()

Output:

In [259]: joined
Out[259]:
   id   name gender city
0   1  Smith      M  NaN
1   2   John      M  NaN

export to CSV:

In [262]: joined.to_csv('c:/temp/joined.csv', index=False)

c:/temp/joined.csv:

id,name,gender,city
1,Smith,M,
2,John,M,