AMisra AMisra - 4 months ago 8
Python Question

merge pandas csv in different directories

I have csv files with same names in different directories and i want to merge them as a single csv.


dir1
abcd__diff.csv
efgh__diff.csv

dir2
abcd_diffhere.csv
efgh_diffhere.csv

operation
dir1/abcd_diff.csv join dir2/abcd_diffhere.csv
dir1/efgh_diff.csv join dir2/efgh_diffhere.csv


I want to merge them using a common field. I can use pandas join operator but what is the most efficient way to search and map the filenames across directories.
I split the filenames using character __ giving the list of files with same names in each directory. I can do two for loops and iterate but that would not be efficient as I have around 200 files.

Answer

Situate files like this

files1 = []
files2 = []
dir1path = './dir1/'
dir2path = './dir2/'
dir1 = os.listdir(dir1path)
dir2 = os.listdir(dir2path)

for f in dir1:
    fmatch = f.split('.csv')[0] + 'here.csv'    
    if fmatch in dir2:
        files1.append(f)
        files2.append(fmatch)

files1 = [os.path.join(dir1path, f) for f in files1]
files2 = [os.path.join(dir2path, f) for f in files2]

fpairs = zip(files1, files2)

Then create list of dataframes

# edit this lambda function accroding to your needs
# it will have to be specific to your csv formatting
rf = lambda f: pd.read_csv(f)

dfs = [rf(fp[0]).merge(rf(fp[1]), on='Key') for fp in fpairs]