Chris Baker Chris Baker - 27 days ago 6
SQL Question

High-performance multi-tier tag filtering

I have a large database of artists, albums, and tracks. Each of these items may have one or more tags assigned via glue tables (track_attributes, album_attributes, artist_attributes). There are several thousand (or even hundred thousand) tags applicable to each item type.

I am trying to accomplish two tasks, and I'm having a very hard time getting the queries to perform acceptably.

Task 1) Get all tracks that have any given tags (if provided) by artists that have any given tags (if provided) on albums with any given tags (if provided). Any set of tags may not be present (i.e. only a track tag is active, no artist or album tags)

Variation: The results are also presentable by artist or by album rather than by track

Task 2) Get a list of tags that are applied to the results from the previous filter, along with a count of how many tracks have each given tag.

What I am after is some general guidance in approach. I have tried temp tables, inner joins, IN(), all my efforts thus far result in slow responses. A good example of the results I am after can be seen here: http://www.yachtworld.com/core/listing/advancedSearch.jsp, except they only have one tier of tags, I am dealing with three.

Table structures:

Table: attribute_tag_groups
Column | Type |
------------+-----------------------------+
id | integer |
name | character varying(255) |
type | enum (track, album, artist) |

Table: attribute_tags
Column | Type |
--------------------------------+-----------------------------+
id | integer |
attribute_tag_group_id | integer |
name | character varying(255) |

Table: track_attribute_tags
Column | Type |
------------+-----------------------------+
track_id | integer |
tag_id | integer |

Table: artist_attribute_tags
Column | Type |
------------+-----------------------------+
artist_id | integer |
tag_id | integer |

Table: album_attribute_tags
Column | Type |
------------+-----------------------------+
album_id | integer |
tag_id | integer |

Table: artists
Column | Type |
------------+-----------------------------+
id | integer |
name | varchar(350) |

Table: albums
Column | Type |
------------+-----------------------------+
id | integer |
artist_id | integer |
name | varchar(300) |

Table: tracks
Column | Type |
-------------+-----------------------------+
id | integer |
artist_id | integer |
album_id | integer |
compilation | boolean |
name | varchar(300) |


EDIT I am using PHP, and I am not opposed to doing any sorting or other hijinx in script, my #1 concern is speed of return.

Answer Source

You probably should try to denormalize your data. Your structure is optimised for insert/update load, but not for queries. As I got it, your will have much more select queries than insert/update queries.

For example you can do something like this:

store your data in normalized structure.

create agregate table like this

  track_id, artist_tags, album_tags, track_tags
   1 , jazz/pop/,  jazz/rock, /heavy-metal/  

    or 

    track_id, artist_tags, album_tags, track_tags
    1 , 1/2/,  1/3, 4/

to spead up search you probably should create FULLTEXT index on *_tags columns

query this table with sql like

select * from aggregate where album_tags  MATCH (track_tags) AGAINST ('rock')

rebuild this table incrementally once a day.