John Baum John Baum - 1 month ago 5
SQL Question

Querying across multiple tables avoiding a union all

I have the following DB tables that I am trying to query:

t_shared_users
user_id
user_category
folder_id
expiry

t_documents
id
folder_id
user_id
user_category
description
created
updated

t_folder
id
type
user_id
user_category
created
updated


I would like to find all the documents you own and have shared access to. ie. search for all documents in t_documents where user_id = 1 AND user_category = 100 but also include those documents in the folder you have access to in t_shared_users. Here is my attempt at the query:

SELECT
id,
folder_id,
user_id,
user_category,
description,
created,
updated
FROM
t_documents
WHERE
user_category = 100
AND user_id = 1

UNION ALL

SELECT
d.id,
d.folder_id,
d.user_id,
d.user_category,
d.description,
d.created,
d.updated
FROM
t_documents d
JOIN
t_shared_users s
ON
d.folder_id = s.folder_id
WHERE
d.user_category = 100
d.AND user_id = 1

ORDER BY
created ASC
LIMIT
10


Is there any better/more performant/concise way to write this query? The above seems a little verbose and slow.

edit:

CREATE TABLE t_folder (
id SERIAL NOT NULL,
user_category SMALLINT NOT NULL,
user_id INTEGER NOT NULL,
type INTEGER NOT NULL,
description TEXT,
created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
PRIMARY KEY (id)
);

CREATE TABLE t_documents (
id BIGSERIAL NOT NULL,
folder_id INTEGER,
user_category SMALLINT NOT NULL,
user_id INTEGER NOT NULL,
description TEXT NOT NULL,
created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
PRIMARY KEY (id)
);

CREATE TABLE t_shared_users (
id SERIAL,
folder_id INTEGER NOT NULL,
user_category INTEGER NOT NULL,
user_id INTEGER NOT NULL,
expiry TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
PRIMARY KEY (id)
);

Answer

This is your query:

  SELECT
    id,
    folder_id,
    user_id,
    user_category,
    description,
    created,
    updated
  FROM
    t_documents
  WHERE
    user_category = 100
    AND user_id = 1

  UNION ALL

  SELECT
    d.id,
    d.folder_id,
    d.user_id,
    d.user_category,
    d.description,
    d.created,
    d.updated
  FROM
    t_documents d
  JOIN
    t_shared_users s
  ON
    d.folder_id = s.folder_id
  WHERE
    d.user_category = 100
    AND d.user_id = 1 -- your query actually has a typo here

What I don't understand about the above query is why you are filtering on d.user_category and d.user_id (t_documents table) in the bottom part of the query. Are you sure you didn't mean s.user_category and s.user_id (t_shared_users)? If not, what is the point of joining with t_shared_users?

Assuming that I am correct that your query is in error, this is how I would rewrite it:

select d.*
  from t_documents d
 where d.user_category = 100
   and d.user_id = 1
 union
select d.*
  from t_shared_users s
  join t_documents d
    on d.folder_id = s.folder_id
 where s.user_category = 100
   and s.user_id = 1

Notice that I use union instead of union all, as I believe it's technically possible to get possibly unwanted duplicate documents otherwise.

Also, just as a rough approximation, these are the indexes I would define for good performance:

  • t_documents (user_id, user_category)
  • t_documents (folder_id)
  • t_shared_users (user_id, user_category, folder_id)
Comments