Matthijs Matthijs - 1 month ago 6
Python Question

Python: appending/merging multiple csv files respecting headers and write to csv

[Using Python3] I'm very new to (Python) programming but nonetheless am writing a script that scans a folder for certain csv files, then I want to read them all and append them and write them into another csv file.

In between it is required that data is returned only where the values in a certain columns are matched to a set criteria.

All csv files have the same columns, and would look somewhere like this:

header1 header2 header3 header4 ...
string float string float ...
string float string float ...
string float string float ...
string float string float ...
... ... ... ... ...


The code I'm working with right now is the following (below), however it just keeps on overwriting the data from the previous file. That does make sense to me, I just cannot figure out how to get it working though.

Code:

import csv
import datetime
import sys
import glob
import itertools
from collections import defaultdict

# Raw data files have the format like '2013-06-04'. To be able to use this script during the whole of 2013, the glob is set to search for the pattern '2013-*.csv'
files = [f for f in glob.glob('2013-*.csv')]

# Output file looks like '20130620-filtered.csv'
outfile = '{:%Y%m%d}-filtered.csv'.format(datetime.datetime.now())

# List of 'Header4' values to be filtered for writing output
header4 = ['string1', 'string2', 'string3', 'string4']

for f in files:
with open(f, 'r') as f_in:
dict_reader = csv.DictReader(f_in)

with open(outfile, 'w') as f_out:
dict_writer = csv.DictWriter(f_out, lineterminator='\n', fieldnames=dict_reader.fieldnames)
dict_writer.writeheader()
for row in dict_reader:
if row['Campaign'] in campaign_names:
dict_writer.writerow(row)


I also tried something like
readers = list(itertools.chain(*map(lambda f: csv.DictReader(open(f)), files)))
, and trying to iterate over the readers however then I cannot figure out how to work with the headers. (I get the error that itertools.chain() does not have the fieldnames attribute).

Any help is very much appreciated!

Answer

You keep re-opening the file and overwriting it.

Open outfile once, before your loops start. For the first file you read, write the header and the rows. For rest of the files, just write the rows.

Something like

with open(outfile, 'w') as f_out:
    dict_writer = None
    for f in files:
        with open(f, 'r') as f_in:
            dict_reader = csv.DictReader(f_in)
            if not dict_writer:
                dict_writer = csv.DictWriter(f_out, lineterminator='\n', fieldnames=dict_reader.fieldnames)
                dict_writer.writeheader()
            for row in dict_reader:
                if row['Campaign'] in campaign_names:
                    dict_writer.writerow(row)
Comments