jcollum jcollum - 1 year ago 74
SQL Question

in general, should every table in a database have an identity field to use as a PK?

This seems like a duplicate even as I ask it, but I searched and didn't find it. It seems like a good question for SO -- even though I'm sure I can find it on many blogs etc. out there. SO will have more of a debate than you can get on a blog.

I'm running into an issue with a join: getting back too many records. I think of this as "expansion". I added a table to the set of joins and the number of rows expanded, way too much. Usually when this happens I add a select of all the ID fields that are involved in the join. That way it's pretty obvious where the expansion is happening and I can change the ON of the join to fix it. Except in this case, the table that I added doesn't have an ID field. To me, this is a problem. But perhaps I'm wrong.

The question: should every table in a database have an IDENTITY field that's used as the PK? Are there any drawbacks to having an ID field in every table? What if you're reasonably sure this table will never be used in a PK/FK relationship?

Related, but not duplicate: http://stackoverflow.com/questions/932913/when-having-an-identity-column-is-not-a-good-idea

Apparently this debate has been going on for a while. Shoulda known.

This post (surrogate vs. natural keys) is also relevant.

Answer Source

There are two concepts that are close but should not be confused: IDENTITY and PRIMARY KEY

Every table (except for the rare conditions) should have a PRIMARY KEY, that is a value or a set of values that uniquely identify a row.

See here for discussion why.

IDENTITY is a property of a column in SQL Server which means that the column will be filled automatically with incrementing values.

Due to the nature of this property, the values of this column are inherently UNIQUE.

However, no UNIQUE constraint or UNIQUE index is automatically created on IDENTITY column, and after issuing SET IDENTITY_INSERT ON it's possible to insert duplicate values into an IDENTITY column, unless it had been explicity UNIQUE constrained.

The IDENTITY column should not necessarily be a PRIMARY KEY, but most often it's used to fill the surrogate PRIMARY KEYs

It may or may not be useful in any particular case.

Therefore, the answer to your question:

The question: should every table in a database have an IDENTITY field that's used as the PK?

is this:

No. There are cases when a database table should NOT have an IDENTITY field as a PRIMARY KEY.

Three cases come into my mind when it's not the best idea to have an IDENTITY as a PRIMARY KEY:

  • If your PRIMARY KEY is composite (like in many-to-many link tables)
  • If your PRIMARY KEY is natural (like, a state code)
  • If your PRIMARY KEY should be unique across databases (in this case you use GUID / UUID / NEWID)

All these cases imply the following condition:

You shouldn't have IDENTITY when you care for the values of your PRIMARY KEY and explicitly insert them into your table.


Many-to-many link tables should have the pair of id's to the table they link as the composite key.

It's a natural composite key which you already have to use (and make UNIQUE), so there is no point to generate a surrogate key for this.

I don't see why would you want to reference a many-to-many link table from any other table except the tables they link, but let's assume you have such a need.

In this case, you just reference the link table by the composite key.

This query:

CREATE TABLE a (id, data)
CREATE TABLE b (id, data)
CREATE TABLE ab (a_id, b_id, PRIMARY KEY (a_id, b_id))
CREATE TABLE business_rule (id, a_id, b_id, FOREIGN KEY (a_id, b_id) REFERENCES ab)

FROM    business_rule br
JOIN    a
ON      a.id = br.a_id

is much more efficient than this one:

CREATE TABLE a (id, data)
CREATE TABLE b (id, data)
CREATE TABLE ab (id, a_id, b_id, PRIMARY KEY (id), UNIQUE KEY (a_id, b_id))
CREATE TABLE business_rule (id, ab_id, FOREIGN KEY (ab_id) REFERENCES ab)

FROM    business_rule br
JOIN    a_to_b ab
ON      br.ab_id = ab.id
JOIN    a
ON      a.id = ab.a_id

, for obvious reasons.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download