Jessica Jessica - 1 year ago 113
Python Question

Giving input and output files for batch processing?

I have more than 100 .txt files in a directory, that I want to run the same python script for each one of the files. Right now I have to type a similar command over 100 times because there is a slight variation for each command because the input and output file names are different. I was wondering if this could done automatically.

My code looks like this:

import pandas as pd
import numpy as np
import os
import argparse

parser = argparse.ArgumentParser(description='Excelseq ')
parser.add_argument('-i','--txt', help='Input file name',required=True)
parser.add_argument('-o','--output',help='output file name', required=True)
args = parser.parse_args()

df = pd.read_csv(args.txt, sep='\t' )
f=open('VD.fasta', "r+")
out = open(args.output, "w")

for line in f:
title = line[1:]
title = title.rstrip()

seq = f.readline()
seq = seq.rstrip()

if df['ReadID'].str.contains(title).any():

The code takes 1 input file:
which is given by
, it is a .txt file, and the script checks if the
from the .txt file is in the .fasta file. If it is, the script will print out the
. But for each output file, I would like the name to be the same as the .txt file but with a .fasta extension.

For example:

input file1 : H100.txt
output file1: H100.fasta

input file2 : H101.txt
output file2: H101.fasta

input file3: H102.txt
output file3: H102.fasta


How would I automate this for over 100 files? Each run takes a long time and I don't want to sit in front of the computer to wait for it to finish and then run the next.

Answer Source

I couldn't test this because I don't have the input files nor do I have all the third party modules installed that you do. However it should be close to what you should do, as I was trying to explain in the comments.

import glob
import numpy as np
import os
import pandas as pd
import sys

def process_txt_file(txt_filename, f):
    root, ext = os.path.splitext(txt_filename)
    fasta_filename = root + '.fasta'

    print('processing {} -> {}'.format(txt_filename, fasta_filename))

    df = pd.read_csv(txt_filename, sep='\t' )
    with open(fasta_filename, "w") as out:  # rewind
        for line in f:
            title = line[1:].rstrip()
            seq = f.readline().rstrip()

            if  df['ReadID'].str.contains(title).any():
                out.write('>{0}\n{1}\n'.format(title, seq))

if __name__ == '__main__':
    if len(sys.argv) != 2:
        print('usage: {} <path-to-txt-files-directory>'.format(sys.argv[0]))

    with open('VD.fasta', "r+") as f:
        for input_filename in glob.glob(os.path.join(sys.argv[1], 'H*.txt'):
            process_txt_file(input_filename, f)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download