Biotechgeek Biotechgeek - 1 month ago 6
Python Question

How to zip a list of lists in python?

I have a list of lists

sample = [['A','T','N','N'],['T', 'C', 'C', 'C']],[['A','T','T','N'],['T', 'T', 'C', 'C']].


I am trying to zip the file such that only A/T/G/C are in lists and the output needs to be a list

[['AT','TCCC'],['ATT','TTCC']]


When I use this code:

tt = ["".join(y for y in x if y in {'A','G','T','C'}) for x in sample]


However, I only get the output as:

['ATT','TTCC']


Any suggestions where I am going wrong?

In my actual code I am first transposing the lists:

seq_list = [['TCCGGGGGTATC', 'TCCGTGGGTATC', ...]] # one nested list

numofpops = len(seq_list)

### Tranposing. Moving along the columns only

#column_list = []
for k in range(len(seq_list)):
column_list = [[] for i in range(len(seq_list[k][0]))]
for seq in seq_list[k]:
for i, nuc in enumerate(seq):
column_list[i].append(nuc)
ddd = column_list
print ddd

tt = ["".join(y for y in x if y in {'A','G','T','C'}) for x in ddd]
print tt

Answer

Your actual code is discarding lists. You only ever process the last entry.

Your code works fine otherwise. Just do that in the loop and then append the result to some final list:

results = []

for k in range(len(seq_list)):
    column_list = [[] for i in range(len(seq_list[k][0]))]
    for seq in seq_list[k]:
        for i, nuc in enumerate(seq):
            column_list[i].append(nuc)
    # process `column_list` here, in the loop (no need to assign to ddd)
    tt = ["".join(y for y in x if y in {'A','G','T','C'}) for x in column_list]

    results.append(tt)

Note that you could use the zip() function instead of your transposition list:

results = []
for sequence in seq_list:
    for column_list in zip(*sequence):
        tt = [''.join([y for y in x if y in 'AGTC']) for x in column_list]
        results.append(tt)