Biotechgeek Biotechgeek - 1 month ago 13
Python Question

How to compare elements in a list of lists and compare keys in a list of lists in Python?

I have the following sequence:

seq = [['ATG','ATG','ATG','ATG'],['GAC','GAT','GAA','CCT'],['GCC','GCG','GCA','GCT']]


Here is a dictionary key that stores the value of amino acid for each of the codons (Triplet bases like
ATG, GCT
etc).

aminoacid = {'TTT' : 'F','TTC' : 'F','TTA' : 'L','TTG' : 'L','CTT' : 'L','CTC' : 'L','CTA' : 'L','CTG' : 'L','ATT' : 'I','ATC' : 'I','ATA' : 'I','ATG' : 'M','GTT' : 'V','GTC' : 'V','GTA' : 'V','GTG' : 'V','TCT' : 'S','TCC' : 'S','TCA' : 'S','TCG' : 'S','CCT' : 'P','CCC' : 'P','CCA' : 'P','CCG' : 'P','ACT' : 'T','ACC' : 'T','ACA' : 'T','ACG' : 'T','GCT' : 'A','GCC' : 'A','GCA' : 'A','GCG' : 'A','TAT' : 'Y','TAC' : 'Y','TAA' : 'STOP','TAG' : 'STOP','CAT' : 'H','CAC' : 'H','CAA' : 'Q','CAG' : 'Q','AAT' : 'N','AAC' : 'N','AAA' : 'K','AAG' : 'K','GAT' : 'D','GAC' : 'D','GAA' : 'E','GAG' : 'E','TGT' : 'C','TGC' : 'C','TGA' : 'STOP','TGG' : 'W','CGT' : 'R','CGC' : 'R','CGA' : 'R','CGG' : 'R','AGT' : 'S','AGC' : 'S','AGA' : 'R','AGC' : 'R','GGT' : 'G','GGC' : 'G','GGA' : 'G','GGG' : 'G'}


As one can see several codons can code for the same aminoacid (eg.
GGT,GGC,GGA, GGG etc all code for Glycine (G)
). These are Synonymous (PSyn) and if codons code for different amino acids they are Non-Synonymous (PNonsyn)

In this code, I need to do the following:


  1. For each element in the list of lists, if there is a change in the bases AND they all code for the same amino acid, then increase count of PSyn by 1 and if it codes for different amino acids increment count PNonsyn by 1

    Here,

    ATG all code for M #However, all are ATG's no change in bases. So no increment in count

    GAC, GAT for D; GAA for E; and CCT for P #Codes for three different amino acids, increment count by 1

    GGT,GGC,GGA, GGG for G #Different bases but all code for same amino acids, increment count by 1


    OutPut:
    CountPsyn = 1

    CountPNonsyn = 1

  2. Generate a list of amino acids that corresponds to the above seq. such that:

    Output : ['ATG','nonsyn','G'] #For sites with different aminoacids, the list should say nonsyn and for sites which had identical bases it should list the bases



I need help modifying the following code to get the program to work. I am not confident on how to call values from dictionary and check them against all the elements.
Code Attempted:

countPsyn = 0
countPnonsyn = 0
listofaa =[]

for i in seq:
for base, value in enumerate(i):
if value[i] == value[i+1]: #eg. ['ATG','ATG','ATG','ATG']
listofaa.append(value)

if value[i] != value[i+1]:
if aminoacid[value][i] == aminoacid[value][i+1]: #eg.['GCC','GCG','GCA','GCT']
countPsyn =+ 1
listofaa.append(aminoacid)
else: #eg. ['GAC','GAT','GAA','CCT']
countPnonsyn =+ 1
listofaa.append('nonsyn')

File Output can be found [here][1] https://eval.in/669107

Answer

Here is my stab at the solution.

aminoacid = {'GCC': 'A' ,'TTT' : 'F','TTC' : 'F','TTA' : 'L','TTG' : 'L','CTT' : 'L','CTC' : 'L','CTA' : 'L','CTG' : 'L','ATT' : 'I','ATC' : 'I','ATA' : 'I','ATG' : 'M','GTT' : 'V','GTC' : 'V','GTA' : 'V','GTG' : 'V','TCT' : 'S','TCC' : 'S','TCA' : 'S','TCG' : 'S','CCT' : 'P','CCC' : 'P','CCA' : 'P','CCG' : 'P','ACT' : 'T','ACC' : 'T','ACA' : 'T','ACG' : 'T','GCT' : 'A','GCG' : 'A','GCA' : 'A','GCG' : 'A','TAT' : 'Y','TAC' : 'Y','TAA' : 'STOP','TAG' : 'STOP','CAT' : 'H','CAC' : 'H','CAA' : 'Q','CAG' : 'Q','AAT' : 'N','AAC' : 'N','AAA' : 'K','AAG' : 'K','GAT' : 'D','GAC' : 'D','GAA' : 'E','GAG' : 'E','TGT' : 'C','TGC' : 'C','TGA' : 'STOP','TGG' : 'W','CGT' : 'R','CGC' : 'R','CGA' : 'R','CGG' : 'R','AGT' : 'S','AGC' : 'S','AGA' : 'R','AGC' : 'R','CGT' : 'G','GGC' : 'G','GGA' : 'G','GGG' : 'G',}

seq = [['ATG','ATG','ATG','ATG'],['GAC','GAT','GAA','CCT'],['GCC','GCG','GCA','GCT']]

Psyn = 0;
PNonsyn = 0;
output = [];

#loop through each list in your list of list
for sublist in seq:
    acids = [aminoacid[base] for base in sublist]
    if len(set(acids)) != 1: #if there are different amino acids, then nonsync
        output.append('nonsync')
        PNonsyn += 1
    else: #if same amino acid
        if len(set(sublist)) == 1: #if same base
            output.append(sublist[0]);
        else: #if not same base
            output.append(acids[0]);
            Psyn += 1

print "Psyn = "+ str(Psyn)
print "PNonsyn = "+ str(PNonsyn)
print output

Admittedly it's not a modification of your code, but there is a neat trick here to void the double for loop. Given a list mylist, you could find all uniques elements in a list by calling set(mylist). E.g.

>>> a = ['AGT','AGT','ACG']
>>> set(a)
set(['AGT', 'ACG'])
>>> len(set(a))
2