data_garden - 1 year ago 68
Python Question

# python - top matches between nested dictionaries

I have a data structure of playlists:

``````users={
'playlist1': {'Karma Police': 2.0,'Bittersweet Symphony': 3.0,'The Queen Is Dead':4.0,'Song 1': 1.0},
'playlist2': {'Bittersweet Symphony': 1.0,'Karma Police': 1.0,'The Queen Is Dead': 7.0,'Song 2': 1.0 },
'playlist3': {'Karma Police': 4.0,'Bittersweet Symphony': 4.0,'The Queen Is Dead':3.0,'Song 3': 1.0}
}
``````

which is being passed to this function:

``````def sim_distance(users,playlist1,playlist2):
'''
Returns a distance-based similarity score for
playlist1 and playlist2
'''
# Get the list of shared_items
si={}
for item in users[playlist1]:
if item in users[playlist2]:
si[item]=1
# if they have no ratings in common, return 0
if len(si)==0: return 0
# Add up the squares of all the differences
sum_of_squares=sum([pow(users[playlist1][item]-users[playlist2][item],2)
for item in users[playlist1] if item in users[playlist2]])
return 1/(1+sum_of_squares)

#print sim_distance(users, 'playlist1', 'playlist2')
``````

lastly, the above
`function`
is part of another one:

``````def topMatches(users,playlist,n=2,similarity=sim_distance):

'''
Returns the best matches for user from
the prefs dictionary.
Number of results and similarity function
are optional params.
'''

scores=[(similarity(users,playlist,other),other)
for other in users if other!=playlist]
# Sort the list so the highest scores appear at the top
scores.sort( )
scores.reverse( )
return scores[0:n]
``````

`topMatches(users, 'playlist1')`

prints:
`[(0.14285714285714285, 'playlist3'), (0.06666666666666667, 'playlist2')]`

but If I have a much more nested structure, like so:

``````playlists_user1={'user1':[
{'playlist1A':{
'tracks': [
{'name': 'Bitter Sweet Symphony','artist': 'The Verve','count': 2.0},
{'name': 'Song 1a','artist': 'Band 1a','count': 2.0}
]
}
},
{'playlist1B':{
'tracks': [
{'name': 'We Will Rock You','artist': 'Queen', 'count': 3.0},
{'name': 'Roxanne','artist': 'Police','count': 5.0},
{'name': 'Song 1b','artist': 'Band 1b','count': 2.0}
]
}
}
]
}
playlists_user2={'user2':[
{'playlist2A':{
'tracks': [
{'name': 'Bitter Sweet Symphony','artist': 'The Verve','count': 4.0},
{'name': 'Song 2a','artist': 'Band 2a','count': 2.0}
]
}
},
{'playlist2B':{
'tracks': [
{'name': 'We Will Rock You','artist': 'Queen', 'count': 4.0},
{'name': 'Roxanne','artist': 'Police','count': 1.0},
{'name': 'Song 2b','artist': 'Band 2b','count': 2.0}
]
}
}
]
}
playlists_user3={'user3':[
{'playlist3A':{
'tracks': [
{'name': 'Bitter Sweet Symphony','artist': 'The Verve','count': 6.0},
{'name': 'Song 3a','artist': 'Band 3a','count': 1.0}
]
}
},
{'playlist3B':{
'tracks': [
{'name': 'We Will Rock You','artist': 'Queen', 'count': 8.0},
{'name': 'Roxanne','artist': 'Police','count': 3.0},
{'name': 'Song 3b','artist': 'Band 3b','count': 4.0}
]
}
}
]
}
``````

how do I correct:

1)
`sim_distance(users,playlist1,playlist2)`

and

2)
`topMatches(users,playlist,n=2,similarity=sim_distance)`

in order to adapt my code to this new nested structure?

I'm looking for top
`matches`
for
`'playlist1A'`
and
`'playlist1B'`
in
`user2`
and
`user3`
, top matches for
`'playlist2A'`
and
`'playlist2B'`
in
`user1`
and
`user3`
and so forth.

I made a few assumptions based on the last data you posted. The code is untested and I had to break from your list comprehension into ugly for loops. The dictionaries were nested in annoying ways so I couldn't make it cleaner

``````import operator

def sim_distance(playlist1, playlist2):
"""
Returns a distance-based similarity score for
playlist1 and playlist2
"""
# Flatten playlists
playlist1, playlist2 = list(playlist1.values())[0], list(playlist2.values())[0]
sum_of_squares = 0.
for i in range(len(playlist1['tracks'])):
for j in range(len(playlist2['tracks'])):
if playlist1['tracks'][i]['name'] == playlist2['tracks'][j]['name']:
sum_of_squares += (playlist1['tracks'][i]['count'] - playlist2['tracks'][j]['count'])**2

# if they have no ratings in common, return 0
if (sum_of_squares < 10e-10): return 0.

return 1/(1+sum_of_squares)

def topMatches(users, playlist, n=2, similarity=sim_distance):

'''
Returns the best matches for a playlist of a user
The candidates are all other playlists of others users.
users is a list of dictionaries.

This code assumes that each playlist has a different name
Number of results and similarity function
are optional params.
'''
playlist_name = list(playlist.keys())[0]
scores = {}
for user in users:
for other_playlist in list(user.values())[0]:
other_name = list(other_playlist.keys())[0]
# Making sure not to compare the playlist with itself
if  playlist_name != other_name:
scores[other_name] = sim_distance(playlist, other_playlist)

# Sort the list so the highest scores appear at the top
sorted_scores = sorted(scores.items(), key=operator.itemgetter(1))
sorted_scores.reverse()
return sorted_scores[0:n]

#Playsts_userX as defined in the question
users = [playlists_user1, playlists_user2, playlists_user3]
'count': 1.0,
'name': 'Karma Police'},
{'artist': 'The Verve', 'count': 2.0, 'name': 'Bitter Sweet Symphony'},
{'artist': 'Band 1a', 'count': 2.0, 'name': 'Song 1a'}]}}
topMatches(users, test_playlist, n=8)
``````

gives

``````[('playlist2A', 0.07692307692307693),
('playlist3A', 0.013157894736842105),
('playlist1B', 0.0),
('playlist3B', 0.0),
('playlist2B', 0.0)]
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download