Altheman Altheman - 2 months ago 15
Python Question

convert a fasta file to a tab-delimited file using python script

I am student currently learning how to write scripts in python. I have been giving the following exercise. I have to convert a fasta file in the following format:

>header 1
AATCTGTGTGATAT
ATATA
AT
>header 2
AATCCTCT


into this:

>header 1 AATCTGTGTGATATATATAAT
>header 2 AATCCTCT


I am having some difficulty getting rid of the white space (using line.strip()?) Any help would be very much appreciated...

Answer

This creates a new string based on the > character and combines the string until the next >. It then appends to a running list.

# open file and iterate through the lines, composing each single line as we go
out_lines = []
temp_line = ''
with open('path/to/file','r') as fp:
     for line in fp:
         if line.startswith('>'):
             out_lines.append(temp_line)
             temp_line = line.strip() + '\t'
         else:
             temp_line += line.strip()

with open('path/to/new_file', 'w') as fp_out:
    fp_out.write('\n'.join(out_lines))