seibzehn seibzehn - 23 days ago 6
MySQL Question

Prevent duplicates using INNER JOIN and ON with this SQL query

Please refer to below SQL query. I am trying to create a simple activity feed that will match the person who is on record in the

updated_by
and/or
created_by
fields (whichever is newer) in a tasks table with their profile picture in a users table. This query works, except I get duplicates when the two fields have different users because in the ON clause both elements of the OR statement will return true as there is always a
created_by
that will match someone in the users table - I cannot figure out how to limit this to just one when both fields are filled with different users. I've tried adding more CASE WHENs to the ON clause but there is no if/else logic to it so it just defaults to both being true again and returns the same results.

(Note that I need to maintain the ability to limit results in the WHERE clause with status items which are passed in as variable, the query is contained within a PHP function.)

Ideas?

SELECT tasks.*, users.profile_pic,
CASE WHEN tasks.updated_at > tasks.created_at
THEN tasks.updated_at
ELSE tasks.created_at
END AS recent_activity
FROM tasks
INNER JOIN users
ON tasks.updated_by = users.name OR tasks.created_by = users.name
WHERE status <> :status1 AND status <> :status2
ORDER BY recent_activity DESC LIMIT 10'

Answer

You can emulate a lot of "if" logic with OR and AND like so:

ON (tasks.updated_at > tasks.created_at AND tasks.updated_by = users.name)
   OR (tasks.updated_at <= tasks.created_at AND tasks.created_by = users.name)

or use logic similar to what you have in the select:

ON CASE 
      WHEN tasks.updated_at > tasks.created_at 
      THEN tasks.updated_by 
      ELSE tasks.created_by
      END = users.name

Alternatively (since OR and more complicated comparisons like the above pretty much rule out any benefits of indexing), you could join to the users table twice and pick the prioritized value in the select:

SELECT tasks.*
  , CASE WHEN tasks.updated_at > tasks.created_at 
    THEN uu.profile_pic
    ELSE cu.profile_pic
  END AS recent_profile_pic 
  , CASE WHEN tasks.updated_at > tasks.created_at 
    THEN uu.name 
    ELSE cu.name 
  END AS recent_activity 
FROM tasks
LEFT JOIN users AS uu ON tasks.updated_by = uu.name 
INNER JOIN users AS cu ON tasks.created_by = cu.name 
WHERE status <> :status1 AND status <> :status2 
ORDER BY recent_activity DESC LIMIT 10

This is assuming all the fields you used that did not specify the table they were a part of were from tasks and the extra join will not cause ambiguities.