Biotechgeek Biotechgeek - 1 month ago 6
Python Question

How to compare non-identical lists and derive values from a dictionary in Python?

Here is a dictionary key that stores the value of amino acid (single alphabets) for each of the codons (Triplet bases like ATG, GCT etc).

aminoacid = {'TTT' : 'F','TTC' : 'F','TTA' : 'L','TTG' : 'L','CTT' : 'L','CTC' : 'L','CTA' : 'L','CTG' : 'L','ATT' : 'I','ATC' : 'I','ATA' : 'I','ATG' : 'M','GTT' : 'V','GTC' : 'V','GTA' : 'V','GTG' : 'V','TCT' : 'S','TCC' : 'S','TCA' : 'S','TCG' : 'S','CCT' : 'P','CCC' : 'P','CCA' : 'P','CCG' : 'P','ACT' : 'T','ACC' : 'T','ACA' : 'T','ACG' : 'T','GCT' : 'A','GCG' : 'A','GCA' : 'A','GCG' : 'A','TAT' : 'Y','TAC' : 'Y','TAA' : 'STOP','TAG' : 'STOP','CAT' : 'H','CAC' : 'H','CAA' : 'Q','CAG' : 'Q','AAT' : 'N','AAC' : 'N','AAA' : 'K','AAG' : 'K','GAT' : 'D','GAC' : 'D','GAA' : 'E','GAG' : 'E','TGT' : 'C','TGC' : 'C','TGA' : 'STOP','TGG' : 'W','CGT' : 'R','CGC' : 'R','CGA' : 'R','CGG' : 'R','AGT' : 'S','AGC' : 'S','AGA' : 'R','AGC' : 'R','CGT' : 'G','GGC' : 'G','GGA' : 'G','GGG' : 'G',}


As one can see several codons can code for the same aminoacid (eg. GGT,GGC,GGA, GGG etc all code for Glycine (G) ). These are Synonymous (DSyn) and if codons code for different amino acids they are Non-Synonymous (DNonsyn)

This is an extension of this question, if anyone is interested.

I have the following sequences:

list1 = ['ACT','ACT','nonsyn','G','L']

list2 = ['ACT','ACC','GGT','ATT']


Here,
- list1 is derived from a previous calculation, such that it is a combination of bases, aminoacids (single lettered entries) and nonsyn (null).
- list2 is a list containing triplet codons.

In this code, I need to compare list1 and list2. Each element in list1 must only be compared with the corresponding element list2 to do the following:


  1. If codon bases are present in both lists then compare the bases:
    a. If bases are identical
    (eg. ACT, ACT)
    then do nothing.
    b. If bases are non-identical
    (eg. ACT, ACC)
    then look up the amino acid in the dictionary. If the aminoacid is the same then increase
    countDsyn
    by 1 and if they are not the same increase
    countDnonsyn
    by 1

  2. If 'nonsyn' in list1 is compared to list2, do nothing.

  3. If aminoacid from list1 is compared to list2: Look up corresponding amino acid for list2 from aminoacid dictionary.
    a. If amino acids are identical then increment
    countDsyn
    by 1
    b. If amino acids are identical then increment
    countDnonsyn
    by 1



Final OutPut for the given case:

Dsyn = 2

Dnonsyn = 1


NEED HELP to check if the way I am calling the values from dictionary is correct when comparing the if loops

Code Attempted:

countDsyn = 0
countDnonsyn = 0

for pos1,value1 in enumerate(list1):
for pos2,value2 in enumerate(list2):
if value1 in list1 = combination(ATGC,3): #eg. ACT,AGT,TTT etc. There are can be 64 such combinations
if value1 in list1 == value2 in list2: #eg. ACT, ACT
#Do nothing
if value1 in list1 != value1 in list2: #eg. ACT,ACC
if value1[aminoacid] == value2[aminoacid]:
countDsyn =+1
else:
countDnonsyn =+1
if value1 in list1 = "nonsyn":
#Do nothing
if value1 in list1 = (A-Z): #eg. 'G''L' etc.
if value1 == value2[aminoacid] #eg. comparing 'G' and the aminoacid value of GTT from the dictionary
countDsyn =+ 1
if value1 != value2[aminoacid]:
countDnonsyn =+1

Ben Ben
Answer

You need something like this:

for value1, value2 in zip(list1, list2):
    # Condition 2 in your question
    if value1 == 'nonsyn':
        continue
    # Condition 1 in your question
    if len(value1) == 3:
        if aminoacid[value1] == aminoacid[value2]:
            countDsyn += 1
        else:
            countDnonsyn += 1
    # Condition 3 in your question
    else:
        if aminoacid[value2] == value1:
            countDsyn += 1
        else:
            countDnonsyn += 1