pythonbeginner2506 pythonbeginner2506 - 7 months ago 13
Python Question

Counting the number of times a letter occurs at a certain position using python

I'm a python beginner and I've come across this problem and I'm not sure how I'd go about tackling it.

If I have the following sequence/strings:

GATCCG

GTACGC

How to I count the frequency each letter occurs at each position. ie) G occurs at position one twice in the two sequences, A occurs at position 1 zero times etc.

Any help would be appreciated, thank you!

Answer

You can use a combination of defaultdict and enumerate like so:

from  collections import defaultdict

sequences = ['GATCCG', 'GTACGC']
d = defaultdict(lambda: defaultdict(int))  # d[char][position] = count
for seq in sequences:
    for i, char in enumerate(seq):  # enum('abc'): [(0,'a'),(1,'b'),(2,'c')]
        d[char][i] += 1

d['C'][3]  # 2
d['C'][4]  # 1
d['C'][5]  # 1

This builds a nested defaultdict that takes the character as first and the position as second key and provides the count of occurrences of said character in said position.

If you want lists of position-counts:

max_len = max(map(len, sequences))
d = defaultdict(lambda: [0]*max_len)  # d[char] = [pos0, pos12, ...]
for seq in sequences:
    for i, char in enumerate(seq): 
        d[char][i] += 1

d['G']  # [2, 0, 0, 0, 1, 1]