catchmeifyoucan catchmeifyoucan - 4 months ago 11
SQL Question

PostgreSQL - order by an array

I have 2 tables - course that contains id and name of the courses and tagCourse that contains tags for each course.

course tagcourse
------------ ----------------
PK id_course PK tag
name PK, FK id_course


I'd like to write a function that searches courses by given array of tags and returns them ordered by quantity of matching tags. However I don't know how to write it correctly and in an efficient way. Please help me.

ie.

CREATE OR REPLACE FUNCTION searchByTags(tags varchar[])
RETURNS SETOF.....
RETURN QUERY SELECT * FROM course c INNER JOIN tagcourse tc ON c.id_course = tc.id_course
WHERE ??? ORDER BY ???

END....

Answer
CREATE OR REPLACE FUNCTION search_by_tags(tags varchar[])
  RETURNS TABLE (id_course integer, name text, tag_ct integer) AS
$func$
   SELECT id_course, c.name, ct.tag_ct
   FROM  (
      SELECT tc.id_course, count(*)::int AS tag_ct
      FROM   unnest($1) x(tag)
      JOIN   tagcourse tc USING (tag)
      GROUP  BY 1                      -- first aggregate ..
      ) AS ct
   JOIN   course c USING (id_course)   -- .. then join
   ORDER  BY ct.tag_ct DESC            --  more columns to break ties?
$func$  LANGUAGE sql;
  • Use unnest() to produce a table from your input array, like already demonstrated by @Clodoaldo.

  • You don't need plpgsql for this. Simpler with a plain SQL function.

  • I use unnest($1) (with positional parameter) instead of unnest(tags), since the later is only valid for PostgreSQL 9.2+ in SQL functions (unlike plpgsql). I quote the manual here:

In the older numeric approach, arguments are referenced using the syntax $n: $1 refers to the first input argument, $2 to the second, and so on. This will work whether or not the particular argument was declared with a name.

  • count() returns bigint. You need to cast it to int to match the declared return type or declare the the returned column as bigint to begin with.

  • Perfect occasion to simplify the syntax a bit with USING (equi-joins): USING (tag) instead of ON tc.tag = c.tag.

  • It's regularly faster to first aggregate, then join to another table. Reduces the needed join operations.
    As per question of @Clodoaldo in the comments, here is an SQL Fiddle to demonstrate the difference.

  • OTOH, if you aggregate after the join, you don't need a subquery. Shorter, but probably slower:

SELECT c.id_course, c.name, count(*)::int AS tag_ct
FROM   unnest($1) x(tag)
JOIN   tagcourse tc USING (tag)
JOIN   course     c USING (id_course)
GROUP  BY 1
ORDER  BY 3 DESC;  --  more columns to break ties?
Comments