JayA JayA - 1 month ago 8
Python Question

Combining files based on a date range

I'm very new to scripting and as a result am not sure how best to merge a series of files. I'm attempting to create a Quality Control script that makes sure a nightly load was properly uploaded to the DB (we've noticed that if there's a lag for some reason, the sync will exclude any donations that came in during said lag).

I have a directory of daily synced files labeled as such:

20161031_donations.txt

20161030_donations.txt

20161029_donations.txt

20161028_donations.txt

etc etc


Every file has the same header.

I'd like to merge the last 7 days of files into one file with just 1 header row. I'm mostly struggling with understanding how to wildcard a date range. I've only ever done:

for i in a.txt b.txt c.txt d.txt
do this
done


which is fine for a static merge but not dynamic to integrate into a proper QC script.

I have a unix background but would like to do this in python. I'm new to python so please be explanatory in any suggestions.

Answer

Expanding on Alex Hall's answer, you can grab the header from one file and skip it for the remaining files to do the merge

from glob import glob
from shutil import copyfileobj

files = sorted(glob('*_donations.txt'))[-7:]

# if you want most recent file first do
# files.reverse()

with open("merged_file.txt", "w") as outfp:
    for i, filename in enumerate(files):
        with open(filename) as infile:
            if i:
                next(infile)              # discard header
            copyfileobj(infile, outfile)  # write remaining
Comments