Tiago Minuzzi Tiago Minuzzi - 2 months ago 5
Python Question

Python - Comparing files delimiting characters in line

there.
I'm a begginer in python and I'm struggling to do the following:

I have a file like this (+10k line):

EgrG_000095700 /product="ubiquitin carboxyl terminal hydrolase 5"
EgrG_000095800 /product="DNA polymerase epsilon subunit 3"
EgrG_000095850 /product="crossover junction endonuclease EME1"
EgrG_000095900 /product="lysine specific histone demethylase 1A"
EgrG_000096000 /product="charged multivesicular body protein 6"
EgrG_000096100 /product="NADH ubiquinone oxidoreductase subunit 10"


and this one (+600 lines):

EgrG_000076200.1
EgrG_000131300.1
EgrG_000524000.1
EgrG_000733100.1
EgrG_000781600.1
EgrG_000094950.1


All the ID's of the second file are in the first one,so I want the lines of the first file corresponding to ID's of the second one.

I wrote the following script:

f1 = open('egranulosus_v3_2014_05_27.tsv').readlines()
f2 = open('eg_es_final_ids').readlines()
fr = open('res.tsv','w')

for line in f1:
if line[0:14] == f2[0:14]:
fr.write('%s'%(line))

fr.close()
print "Done!"


My idea was to search the id's delimiting the characters on each line to match EgrG_XXXX of one file to the other, an then, write the lines to a new file.
I tried some modifications, that's just the "core" of my idea.
I got nothing. In one of the modifications, I got just one line.

Answer
with open('egranulosus_v3_2014_05_27.txt', 'r') as infile:
    line_storage = {}
    for line in infile:
        data = line.split()
        key = data[0]
        value = data
        line_storage[key] = value

with open('eg_es_final_ids.txt', 'r') as infile, open('my_output.txt', 'w') as outfile:
    for line in infile:
        lookup_key = line.split('.')[0]
        match = line_storage.get(lookup_key)
        if match:
            outfile.write(str(match) + '\n')
Comments