xyz xyz - 1 month ago 6
Python Question

I'm getting KeyError while parsing a data file in python

def loadMovieLens(path='C:\Users\karan\Desktop\ml-100k'):
# Load data
prefs={}
for line in open(path+'/new1.data'):
(user,title,rating,ts)=line.split('\t')[0:4]
prefs[user][title]=float(rating)
return prefs


I'm getting a KeyError while parsing the file.

Answer

Your dictionary has no keys yet, so data[user] won't exist. You can have Python add a default value for missing keys by using the dict.setdefault() method:

prefs.setdefault(user, {})[title] = float(rating)

The above tells prefs to add {} (an empty dictionary) as a value for the key named in user if that key doesn't exist yet. Either way, the existing or new value is then returned.

With a few small improvements, the complete function then becomes:

def loadMovieLens(path='C:\Users\karan\Desktop\ml-100k'):
    prefs = {}
    with open(os.path.join(path, 'new1.data')) as f:
        for line in f:
            user, title, rating, ts = line.split('\t', 4)[:4]
            prefs.setdefault(user, {})[title] = float(rating)
    return prefs

I added a with statement (so the file is closed properly when reading is done), used os.path.join() to build the path (so it handles path separators independent of the current operating system) and limited splitting to 4 times.

You could switch to the csv module to handle splitting on tabs too.