Let's say I have a string of DNA 'GAAGGAGCGGCGCCCAAGCTGAGATAGCGGCTAGAGGCGGGTAACCGGCA'
Consider the first 5 letters: GAAGG
And I want to replace each overlapping bi-gram 'GA','AA','AG','GG' with some number that corresponds to their likelihood of occurrence, summing them. Like 'GA' = 1, 'AA' = 2, 'AG' = .7, 'GG' = .5. So for GAAGG I would have my sumAnswer = 1 + 2 + .7 + 5.
So in pseduo code, I want to...
-iterate over each overlapping bi-gram in my DNA string
-find the corresponding value to each unique bi-gram pair
-sum each value iteratively
I'm not enitrely sure how to iterate over each pair. I thought a for loop would work, but that doesn't account for the overlap: it prints every 2-pair (GAGC = GA,GC), not every overlapping 2-pair (GAGC = GA,AG,GC)
for i in range(0, len(input), 2):
Just leave out the
,2 in your range and make sure to not arrive at the very end of your string:
for i in range(0, len(input)-1): print input[i:i+2]
,2 tells Python to step forward two on every iteration. By leaving it out, you default to stepping forward one.