marcel marcel - 1 month ago 9
Python Question

python KeyError on parsing a dictionary

How can i join these two text documents?


document 1:

1000001 10:0.471669 250:0.127552 30:0.218773 64:0.249413
1000002 130:0.0839656 107:0.185613 30:0.446355 110:0.38011
1000003 1:0.0835855 1117:0.0647112 302:0.0851354 46:0.0601825 48:0.098907 516:0.167713


document 2:

1000001 161:0.115664 207:0.136537 294:0.0974809 301:0.199868
1000002
1000003 555:0.0585849 91:0.0164101


result:

1000001 10:0.471669 250:0.127552 30:0.218773 64:0.249413 161:0.115664 207:0.136537 294:0.0974809 301:0.199868
1000002 130:0.0839656 107:0.185613 30:0.446355 110:0.38011
1000003 1:0.0835855 1117:0.0647112 302:0.0851354 46:0.0601825 48:0.098907 516:0.167713 555:0.0585849 91:0.0164101


explanation:

document 1 and document 2 both have the same structure and they have the same number of lines.
Each line starts with a number (the same number in both documents), and then we have several items in each line which are made up of a number+colon+a decimal number:
example 10:0.471669

these item combinations are unique, what I want to do is to merge them together: take the items from the second document for each line and put it in the corresponding line of the first document.

note:

the initial number at the beginning and the items from one another are separated by a single space.

update



here is my try:


dat1 = {}
with open('doc1') as f:
for line in f.readlines():
dat1[line.split(' ')[0]] = line.strip().split(' ')[1:]

dat2 = {}
with open('doc2') as f:
for line in f.readlines():
key = line.split(' ')[0]
dat2[key] = line.split(' ')[1:]

for key in dat1.keys():
print("%s %s %s" % (key, str.join(' ', dat1[key]), str.join(' ', dat2[key])))


i get a traceback of KeyError, on the lines of the second document when the line doens't have anything to be added to the first document. It is the case in the second line of the second document in the above example.
How can I escape this exception? escape the lines which have only the key and nothing else to add?

Answer

An easier way might be to use a defaultdict of lists:

from collections import defaultdict

data = defaultdict(list)

for filename in 'stem.data', 'stem.info':
    with open(filename) as f:
        for line in f:
            key, _, value = line.partition(' ')
            data[key.strip()].append(value.strip())

for key in sorted(data):
    print key, ' '.join(data[key])    # Python 2
#    print(key, *data[key])            # Python 3

Regarding the printing of the result you could add:

from __future__ import print_function

to the top of your file, and then the Python 3 print() function will be available in Python 2, i.e. you can use the Python 3 print above.


You asked in a comment how to print to a file (Python 3, or Python 2 after importing print_function):

with open('outfile.txt', 'w') as f:
    for key in sorted(data):
        print(key, *data[key], file=f)