Rémi Levassor Rémi Levassor - 7 months ago 22
SQL Question

Composite key VS primary key + not unique index

Here is what I have :

table content : cat_id product_id data1 data2 etc.


the categories are not unique obviously.
the product ids are unique.

2 queries :
1 -- SELECT * WHERE cat_id = :cat - must be as quick as possible
2 -- SELECT * WHERE product_id = :prodId
In second select, I can add : AND cat_id = :cat


What is the more efficient ?


  • 1 - index (not unique) on cat_id (good for select 1)

  • 2 - primary key on product_id (unique -> excellent for select 2)

  • 3 - index (not unique) on cat_id + PK on product_id (good for 1 & 2 separately)

  • 4 - unique constraint with composite [cat_id+product_id] (good for 1 & 2 together)

  • 5 - same as 4, but defining the composite as PK

  • 6 - composite (4 or 5) + single index/PK



For information, I'll have around 20 products in each category and a lot of categories (say 3000) - And ONE product belongs to only ONE category - In fact, that is not really cats and products, that is for the simplicity of explaination;)

thanks!

Answer

A database without Primary key is only half dressed and according to you product_id is an ideal candidate for a primary key, so let choose that. The primary key will be used in

SELECT * WHERE product_id = :prodId

It does not matter if and cat_id = :cat_id becomes a part of the query or not unless you have thousands of cat_ids associated with each product_id.

Then choose an index on cat_id. This will be used on

SELECT * WHERE cat_id = :cat

This will be very quick if the cardinality of the data is good. That means there is a wide distribution of cat_ids in the table. The index of cat_id will not be used in the first query. Thus you have two different indexes and both queries can be expected to be really quick.

Since [cat_id+product_id] != [product_id+cat_id] when it comes to indexing, if you have only one composite index one or the other will be slow.

For example, suppose we had a composite index on (cat_id, product_id) now the following query cannot make use of this index.

 SELECT * FROM tablename WHERE product_id = :prodId

But both these queries can use the (cat_id, product_id) index

SELECT * FROM tablename WHERE cat_id = :cat_id and product_id = :prodId
SELECT * FROM tablename WHERE cat_id = :catId

So in summary. Choose 1 and 2. But if the number of cat_ids are small or there are lots of cat_ids associated with each product_id choose 4 but make sure that the primary key is also in place.