Marina Mitie Monobe Marina Mitie Monobe - 3 months ago 10
Python Question

How to calculate each string length that belongs to a list of strings by python?

Suppose I have a file with

n
DNA sequences, each one in a line. I need to turn them into a list and then calculate each sequence's length and then total length of all of them together. I am not sure how to do that before they are into a list.

# open file and writing each sequences' length
f= open('seq.txt' , 'r')
for line in f:
line= line.strip()
print (line)
print ('this is the length of the given sequence', len(line))

# turning into a list:
lines = [line.strip() for line in open('seq.txt')]
print (lines)


How can I do math calculations from the list? Ex. the total length of all sequences together? Standard deviation from their different lengths etc.

Answer

Just a couple of remarks. Use with to handle files so you don't have to worry about closing them after you are done reading\writing, flushing, etc. Also, since you are looping through the file once, why not create the list too? You don't need to go through it again.

# open file and writing each sequences' length
with open('seq.txt', 'r') as f:
    sequences = []
    total_len = 0
    for line in f:
        new_seq = line.strip()
        sequences.append(new_seq)
        new_seq_len = len(new_seq)
        total_len += new_seq_len

print('number of sequences: {}'.format(len(sequences)))
print('total lenght: {}'.format(total_len))
print('biggest sequence: {}'.format(max(sequences, key=lambda x: len(x))))
print('\t with length {}'.format(len(sorted(sequences, key=lambda x: len(x))[-1])))
print('smallest sequence: {}'.format(min(sequences, key=lambda x: len(x))))
print('\t with length {}'.format(len(sorted(sequences, key=lambda x: len(x))[0])))

I have included some post-processing info to give you an idea of how to go about it. If you have any questions just ask.