cucurbit cucurbit - 7 months ago 17
Python Question

transpose dict of lists and write to csv

I want to write a dict of lists to a tsv file. The problem is that I'm not able to transpose the lists.

I have the following dict of lists:


print d


defaultdict(<type 'list'>, {1: ['Genemark1.10973_g', 'missense_variant', 'MODERATE', 'scaffold_100', 305, '605', 'Asp', 'Gly', 'YES', 'NO', 'NO', 'NO'], 2: ['estExt_Genewise1Plus.C_1000001', 'disruptive_inframe_insertion', 'MODERATE', 'scaffold_100', 5002, '7172', 'Gly', '', 'YES', 'NO', 'NO', 'NO'], 3: ['fgenesh2_pm.100_#_3', 'inframe_insertion', 'MODERATE', 'scaffold_100', 10104, '265266', 'Leu', '', 'YES', 'NO', 'NO', 'NO'], 4: ['estExt_fgenesh2_pg.C_100178', 'inframe_deletion', 'MODERATE', 'scaffold_10', 711411, '351352', 'Gln', '', 'YES', 'NO', 'NO', 'NO'], 5: ['estExt_fgenesh2_pm.C_1060001', 'disruptive_inframe_deletion', 'MODERATE', 'scaffold_106', 5189, '832', 'Leu', 'del', 'YES', 'NO', 'NO', 'NO'], 6: ['Genemark1.10980_g', 'frameshift_variant', 'HIGH', 'scaffold_101', 10838, '313', 'Leu', 'fs', 'NO', 'YES', 'NO', 'NO'], 7: ['Genemark1.10973_g', 'missense_variant', 'MODERATE', 'scaffold_100', 2043, '26', 'Ile', 'Leu', 'YES', 'NO', 'NO', 'NO'], 8: ['fgenesh2_pm.104_#_2', 'stop_gained', 'HIGH', 'scaffold_104', 8574, '310', 'Tyr', '*', 'YES', 'NO', 'NO', 'NO']})


This is my function:

from itertools import izip_longest
def printAnn(d):
rows = izip_longest(*d.values())
w = csv.writer(sys.stdout, delimiter='\t', quoting=csv.QUOTE_NONE, lineterminator='\n')
w.writerows(rows)


The output I'm getting:

Genemark1.10973_g estExt_Genewise1Plus.C_1000001 fgenesh2_pm.100_#_3 estExt_fgenesh2_pg.C_100178 estExt_fgenesh2_pm.C_1060001 Genemark1.10980_g Genemark1.10973_g fgenesh2_pm.104_#_2
missense_variant disruptive_inframe_insertion inframe_insertion inframe_deletion disruptive_inframe_deletion frameshift_variant missense_variant stop_gained
MODERATE MODERATE MODERATE MODERATE MODERATE HIGH MODERATE HIGH
scaffold_100 scaffold_100 scaffold_100 scaffold_10 scaffold_106 scaffold_101 scaffold_100 scaffold_104
305 5002 10104 711411 5189 10838 2043 8574
605 7172 265266 351352 832 313 26 310
Asp Gly Leu Gln Leu Leu Ile Tyr
Gly del fs Leu *
YES YES YES YES YES NO YES YES
NO NO NO NO NO YES NO NO
NO NO NO NO NO NO NO NO
NO NO NO NO NO NO NO NO


PS: I've tried
izip_longest
with a single list (not dict of lists) and it worked fine. What I'm missing?

Answer

No need to zip for your requirement. Think about it. You want each element of a row to appear in its column, which is same as just keeping the row intact.

def printAnn(d):
    w = csv.writer(sys.stdout, delimiter='\t', quoting=csv.QUOTE_NONE, lineterminator='\n')
    w.writerows(d.values())

printAnn(d)
Genemark1.10973_g       missense_variant        MODERATE        scaffold_100 305     605     Asp     Gly     YES     NO      NO      NO
estExt_Genewise1Plus.C_1000001  disruptive_inframe_insertion    MODERATE scaffold_100    5002    7172    Gly             YES     NO      NO      NO
fgenesh2_pm.100_#_3     inframe_insertion       MODERATE        scaffold_100 10104   265266  Leu             YES     NO      NO      NO
estExt_fgenesh2_pg.C_100178     inframe_deletion        MODERATE       scaffold_10     711411  351352  Gln             YES     NO      NO      NO
estExt_fgenesh2_pm.C_1060001    disruptive_inframe_deletion     MODERATE scaffold_106    5189    832     Leu     del     YES     NO      NO      NO
Genemark1.10980_g       frameshift_variant      HIGH    scaffold_101    10838 313     Leu     fs      NO      YES     NO      NO
Genemark1.10973_g       missense_variant        MODERATE        scaffold_100 2043    26      Ile     Leu     YES     NO      NO      NO
fgenesh2_pm.104_#_2     stop_gained     HIGH    scaffold_104    8574    310 Tyr     *       YES     NO      NO      NO

If this is not what you wanted please comment.

Comments