Francesco Francesco - 25 days ago 8
MySQL Question

MYSQL/PHP find the most common item associate with a given item

I have thousands of user generated wish lists of items

the table is something like

collectionId | itemdId | user_id
-----------------------------------

123 | 2345 | 1
123 | 3465 | 1
123 | 876 | 1 // <---
123 | 567 | 1
123 | 980 | 1 // <---

777 | 980 | 2 // <---
777 | 332 | 2
777 | 3465 | 2
777 | 876 | 2 // <---
777 | 678 | 2
777 | 567 | 2
... ... ...
etc..


you see item 876 and 980, are included in both collections (777 and 123) so they are a popular couple/pair

So say I visit the page of item 876
I'd like to display to my users that a very common item associated/wished together with it, it's the item 980 (of course this is based on users'taste)


Think for a moment what Amazon does, if you see a white iphone i want
to suggest you a pink iphone cover because many other users have
suggested/favorited that together with the white iphone


in PHP I would probably do something loopy like in pseudo code

for total number of collection:

select all item from collection 1

select all item from collection 2
do array_interesct (c1,c2)
store the matching items
repeat...

select all item from collection 2
do array_interesct (c1,c3)
store the matching items
repeat...

...then elect all item from collection 2 and repeat all the iterations..


but i'm wondering if this can be accomplished with MYSQL only

Answer

Start with a query that gets all the collections that contain the item you selected:

SELECT collectionId
FROM wishLists
WHERE itemId = 876

From this, you want to get all the other itemIds in those collections.

SELECT itemId
FROM wishLists
WHERE collectionId IN (above query)
AND itemId != 876

This can be rewritten as a join:

SELECT a.itemId
FROM wishLists AS a
JOIN wishLists AS b ON a.collectionId = b.collectionId
WHERE a.itemId != 876 AND b.itemId = 876

Now you can count the repetitions of this to find the most common ones:

SELECT a.itemId
FROM wishLists AS a
JOIN wishLists AS b ON a.collectionId = b.collectionId
WHERE a.itemId != 876 AND b.itemId = 876
GROUP BY a.itemId
ORDER BY COUNT(*) DESC

Add a LIMIT n clause at the end to show the top n items.

Comments