Luke Ackerknecht Luke Ackerknecht - 2 months ago 8
Python Question

How to iterator over every [:2] overlapping characters in a string of DNA code?

Let's say I have a string of DNA 'GAAGGAGCGGCGCCCAAGCTGAGATAGCGGCTAGAGGCGGGTAACCGGCA'

Consider the first 5 letters: GAAGG

And I want to replace each overlapping bi-gram 'GA','AA','AG','GG' with some number that corresponds to their likelihood of occurrence, summing them. Like 'GA' = 1, 'AA' = 2, 'AG' = .7, 'GG' = .5. So for GAAGG I would have my sumAnswer = 1 + 2 + .7 + 5.

So in pseduo code, I want to...
-iterate over each overlapping bi-gram in my DNA string
-find the corresponding value to each unique bi-gram pair
-sum each value iteratively

I'm not enitrely sure how to iterate over each pair. I thought a for loop would work, but that doesn't account for the overlap: it prints every 2-pair (GAGC = GA,GC), not every overlapping 2-pair (GAGC = GA,AG,GC)

for i in range(0, len(input), 2):
print input[i:i+2]


Any tips?

Answer

Just leave out the ,2 in your range and make sure to not arrive at the very end of your string:

for i in range(0, len(input)-1):
    print input[i:i+2]

The ,2 tells Python to step forward two on every iteration. By leaving it out, you default to stepping forward one.

Comments