I am fairly new to Python and I have an issue with my
with open(filename) as file:
for line in file:
header = re.search(r'^>\w+', line)
seq = seq.replace('\n','')
find_Lpattern = re.sub(r'.*AAA', '',seq)
find_Rpattern = re.sub(r'TTT.*', '',find_Lpattern)
seq = ''
seq += line
filename = 'test.txt'
Even assuming your indentation is set in the way that would produce the results you describe, your logic is off. You're printing the header before you handle the accumulated
When you read line 1 of your file, your
header regexp matches. At that point,
seq is the empty string. It therefore prints the match, and runs your replace and
re.sub calls on the empty string.
Then it reads line 2, "AAACTACCGCGTTT", and appends that to
Then it reads line 3, ">seq2". That matches your header regexp, so it prints the header. Then in runs your replace and sub calls on
seq - which is still "AAACTACCGCGTTT" from line 2.
You need to move your
seq handling to before you print the headers, and consider what will happen when you run off the end of the file without finding a final header - you will still have 'seq' contents that you want to parse and print after your for loop has ended.
Or maybe look into the third-party biopattern library, which has the
SeqIO module to parse FASTA files.