Ejonas Ejonas - 4 months ago 12
SQL Question

Postgres: Reference other specific rows from a certain row

I am a postgres newbie, I am struggling to implement this:


  • I have a table of artists and their info (birth, death, style, etc..)

  • Each row has a unique
    id_name
    varchar column has values like
    jones
    or
    adams

  • I have
    similar_artists
    column which is supposed to be an array of varchars containing
    id_name
    values

  • I need to make sure that when I add an id_name to
    similar_artists
    in some row. The other row adds the first id_name automatically in its
    similar_artists
    (i.e. if artist
    A
    has
    B
    in his
    similar_artists
    , so artist
    B
    must have
    A
    in his
    similar_artists
    too).

  • This
    similar_artists
    column logically contains a variable number of entries for each row (e.g. artist
    A
    my have similar artists
    B
    ,
    C
    and artist
    M
    has
    P
    ,
    O
    ,
    X
    ,
    Y
    ,
    Z
    )



What is the best way to implement this? I am currently struggling to do it using trigger procedures and arrays since postgres natively supports them but can't be sure if this is even the judicious way.

Answer

Triggers are a valid approach, and while this could definitely be made to work with arrays I suspect you'll find using an associative (also called a junction) table to be a more standard and more recommended approach to actually hold the relationship data.

Basically you're going to want to make your table of artists and their info similar to what you have now but remove the similar_artists column. It's also a better idea to use a SERIAL PRIMARY KEY rather than a string for your id_name.

You'll then make a second table representing your similar artists relationships. This second table can be created in a couple of different ways each of which will have advantages and disadvantages.

One way is to make it with three columns: a unique ID (in PostgreSQL using a serial primary key isn't a bad idea) and two columns representing artists that each has a foreign key constraint (using REFERENCES) that points back to that primary key used in the artists table. If you do it this way you only need a single entry to represent a link between two artists; you'll have to make the logic slightly more complicated (that is, check both columns) on queries but since both columns can always be fetched in a single query it won't be a huge impact on performance. You could modify this slightly to simplify logic at the expense of space by including two rows for each artist and only track the relationship in a single direction for each row.

Another way is to make it with three columns using the same unique ID as above, a second integer relationship ID, and a third column representing an artist with the foreign key constraint etc. as described above. In this case, two rows would be needed to relate two artists and would work by joining them to a common relationship. If you have more artists that relate to these two, it's just a matter of adding a new row for each mapping to that same relationship. For lots of artists related in lots of big globs, this approach can be more efficient than the approach explained above.

There are other ways besides, but you get the idea. The key point is that by making use of foreign key constraints in these tables you're getting the database system to do more of the work for you.

However you chose to lay it out, once you've got your tables designed you can then build a trigger function to perform your additional INSERT or two depending on design. It's important to design those tables and determine your approach carefully though before you try to write the trigger functions. If you're not familiar with the concept of database normalization you'll want to take a crash course on it before you make your final decisions on table design, too.