Neeraj Kumar Neeraj Kumar - 6 months ago 10
SQL Question

Facebook or LinkedIn like connection suggestion profile alert

I have a "users" table like this:

+-----+------------+---------+---------------+
| uid | first_name | surname | email |
+-----+------------+---------+---------------+
1 joe bloggs joe@test.com
2 bill bloggs bill@test.com
3 john bloggs john@test.com
4 karl bloggs karl@test.com


and a "connections" table like this:

+----+---------+----------+--------+
| id | user_id | user_id2 | status |
+----+---------+----------+--------+
1 1 3 1
2 3 1 1
3 4 3 1
4 3 4 1
5 2 3 1
6 3 2 1


Here id is auto auto_increment user id is saved in either user_id or user_id2. Status 1 means connection is approved and active.

Now I want to send an email alert to users with profile suggestion like Facebook or LinkedIn do. I assume it is possible to get mutual connections between users but not sure how to do. I have tried but it is not perfect. I want to get these all with one mysql query with user and their suggested connection profile. Any idea how to do this?

Many thanks in advance!

Answer

Such algorithms are never perfect: you can never know exactly if two people know each other. People might live in the same building, go to the same work, have 100 friends in common and even share the same hobbies without knowing each other (of course the odds are not that great).

What Social networks do exactly is of course unknown (that's part of the way they make money). But some aspects are known. For instance the number of mutual friends are important (together with for instance location, interests, hobbies, education, work, surname,...).

Based on what you provide, one can more or less only use the number of mutual friends. This can be done using the following query:

SELECT a.user_id, b.user_id2, count(*) --Select the two ids and count the number of transitive relations
FROM connections as a, connections as b --Use the table twice (transitivity)
WHERE a.user_id2 = b.user_id  -- Transitivity constraint
      AND a.user_id < b.user_id2 -- Maintain strict ordening (can be dropped when checked)
      AND a.status = 1 -- First relation must be confirmed.
      AND b.status = 1 -- Second connection must be confirmed.
      AND NOT EXISTS ( -- Not yet friends
          SELECT *
          FROM connections as c
          WHERE c.user_id = a.user_id
                AND c.user_id2 = b.user_id2
      )
GROUP BY a.user_id, b.user_id2 -- Make sure we count them correctly.

As you can see here, the fiddle calculates that (1,2), (1,4) and (2,4) are not yet friends, and all have one mutual friend.

Once the number of mutual friends surpasses a certain threshold, one can propose friendship.

I would however advice you to make your table more compact: add a CHECK to the table such that user_id is always strictly less than user_id2 (CHECK(user_id < user_id2)). This makes the database more compact, for most implementations of a database tool faster as well and queries will become simpler. What is after all the difference between (1,3,1) and (3,1,1).