owen - 6 months ago 13
Python Question

# Reading ATCG DNA sequences, and calculates the numbers of ATCG in third place

I want to read ATCG DNA sequences, and calculates the numbers of ATCG in third place.

For example1:

DNA = AAATTTCCCGGG

In the third place ATCG like this: AA'A'TT'T'CC'C'GG'G'

So in this sequence A=1 T=1 C=1 G=1.

For example2:

DNA = ATGGTATTTAAA

AT"G"GT"A"TT"T"AA"A"

I want to count 3,6,9,12 places ATCG numbers. So in the DNA A=2 T=1 C=0 G=1

My txt file like this:

``````>seq1
ATGGTATTTAAA
ATCGTTTTTAAA
>seq2
ATGGTATTTAAA
ATCGTTTTTAAA
ATCGTTTTTAAA
>seq3
ATGGTATTTAAA
``````

My code like this:

``````f = open("a.txt","r")
seqlist = []
line = line.strip("\n")
if line.startswith(">"):
print(line)
elif line.startswith("A") or line.startswith("T") or line.startswith("C") or line.startswith("G"):
seq = line
y = 0
for y in range(2, len(seq), 3):
x = seq[y]
print(x)
``````

Now I can get the third place ATCG, and I want to put it in a list.

Then I can count the ATCG.

But I don't know how to put it in a ONE list. And get the following results.

``````seq1 A=3 T=3 C=1 G=1
seq2 A=? T=? C=? G=?
seq3 A=? T=? C=? G=?
``````

Thank you so much for help me.

Here's an option that modifies your code as little as possible:

``````from collections import Counter

counter = None
line = line.strip("\n")
if line.startswith(">"):
if counter is not None:
print(counter)
print(line)
counter = Counter()
elif line.startswith("A") or line.startswith("T") or line.startswith("C") or line.startswith("G"):
seq = line
y = 0
for y in range(2, len(seq), 3):
x = seq[y]
counter[x] += 1
print(counter)
``````

Output:

``````>seq1
Counter({'A': 3, 'T': 3, 'C': 1, 'G': 1})
>seq2
Counter({'T': 5, 'A': 4, 'C': 2, 'G': 1})
>seq3
Counter({'A': 2, 'T': 1, 'G': 1})
``````

And here's the same thing but improving your code overall, and formatting the output better:

``````from collections import Counter

counter = None
bases = 'ATCG'

def print_counter():
print(' '.join('%s=%s' % (k, counter[k]) for k in bases))

with open("a.txt", "r") as f:  # Always open files like this
for line in f:  # no need for readlines
line = line.strip("\n")
if line.startswith(">"):
if counter is not None:
print_counter()
print(line)
counter = Counter()
elif line and line[0] in bases:
counter.update(line[2::3])
print_counter()
``````

Output:

``````>seq1
A=3 T=3 C=1 G=1
>seq2
A=4 T=5 C=2 G=1
>seq3
A=2 T=1 C=0 G=1
``````