This seems like a duplicate even as I ask it, but I searched and didn't find it. It seems like a good question for SO -- even though I'm sure I can find it on many blogs etc. out there. SO will have more of a debate than you can get on a blog.
I'm running into an issue with a join: getting back too many records. I think of this as "expansion". I added a table to the set of joins and the number of rows expanded, way too much. Usually when this happens I add a select of all the ID fields that are involved in the join. That way it's pretty obvious where the expansion is happening and I can change the ON of the join to fix it. Except in this case, the table that I added doesn't have an ID field. To me, this is a problem. But perhaps I'm wrong.
The question: should every table in a database have an IDENTITY field that's used as the PK? Are there any drawbacks to having an ID field in every table? What if you're reasonably sure this table will never be used in a PK/FK relationship?
Related, but not duplicate: http://stackoverflow.com/questions/932913/when-having-an-identity-column-is-not-a-good-idea
Apparently this debate has been going on for a while. Shoulda known.
This post (surrogate vs. natural keys) is also relevant.
There are two concepts that are close but should not be confused:
Every table (except for the rare conditions) should have a
PRIMARY KEY, that is a value or a set of values that uniquely identify a row.
See here for discussion why.
IDENTITY is a property of a column in
SQL Server which means that the column will be filled automatically with incrementing values.
Due to the nature of this property, the values of this column are inherently
UNIQUE constraint or
UNIQUE index is automatically created on
IDENTITY column, and after issuing
SET IDENTITY_INSERT ON it's possible to insert duplicate values into an
IDENTITY column, unless it had been explicity
IDENTITY column should not necessarily be a
PRIMARY KEY, but most often it's used to fill the surrogate
It may or may not be useful in any particular case.
Therefore, the answer to your question:
The question: should every table in a database have an IDENTITY field that's used as the PK?
IDENTITYfield as a
Three cases come into my mind when it's not the best idea to have an
IDENTITY as a
PRIMARY KEYis composite (like in many-to-many link tables)
PRIMARY KEYis natural (like, a state code)
PRIMARY KEYshould be unique across databases (in this case you use
All these cases imply the following condition:
IDENTITYwhen you care for the values of your
PRIMARY KEYand explicitly insert them into your table.
Many-to-many link tables should have the pair of
id's to the table they link as the composite key.
It's a natural composite key which you already have to use (and make
UNIQUE), so there is no point to generate a surrogate key for this.
I don't see why would you want to reference a
many-to-many link table from any other table except the tables they link, but let's assume you have such a need.
In this case, you just reference the link table by the composite key.
CREATE TABLE a (id, data) CREATE TABLE b (id, data) CREATE TABLE ab (a_id, b_id, PRIMARY KEY (a_id, b_id)) CREATE TABLE business_rule (id, a_id, b_id, FOREIGN KEY (a_id, b_id) REFERENCES ab) SELECT * FROM business_rule br JOIN a ON a.id = br.a_id
is much more efficient than this one:
CREATE TABLE a (id, data) CREATE TABLE b (id, data) CREATE TABLE ab (id, a_id, b_id, PRIMARY KEY (id), UNIQUE KEY (a_id, b_id)) CREATE TABLE business_rule (id, ab_id, FOREIGN KEY (ab_id) REFERENCES ab) SELECT * FROM business_rule br JOIN a_to_b ab ON br.ab_id = ab.id JOIN a ON a.id = ab.a_id
, for obvious reasons.