Darybrain Darybrain - 5 days ago 7
SQL Question

SQL Server table design advice

This may be a stupid question.

Some efficiency advice for SQL Server 2008 R2 although these servers will be upgraded to 2014 over the next few months. I am creating 3 tables.


  • t1
    has many columns of varying types and an identity numeric ID which is the primary key. I anticipate this table to get into the low 1000s of rows.

  • t2
    has a one-many relationship with t1. It's primary key will be a composite of a t2 ID and t1 ID. The t2 numeric ID will be sent by the frontend application and will be unique per t1 ID. I anticipate this table to get to 50000+ rows. Other than these 2 IDs it will contain a handful of varchar columns of varying lengths.

  • t3
    has a one-many relationship with t2. It's primary key could be a composite of a t3 ID, the t1 ID, and the t2 ID. Again the t3 numeric ID will be sent by the frontend application and will be unique per t2 ID. I anticipate this table to get to several million rows. Other than these 3 IDs it will contain a small number of date or numeric columns.



My question is within t2 should I set up an identity column which t3 will refer to essentially meaning that the primary key of t3 will be 2 columns instead of 3, i.e. a t3 ID and the identity column from t2. Would this be more efficient? Should this identity column within t2 be indexed? to help with joins?

Should I be doing anything else?

Answer

My first thought concerns the statement

... the [t2|t3] numeric ID will be sent by the frontend application ...

If you are generating unique identifiers for use in a database I would suggest you let the database generate the numbers. Unless you can be absolutely sure the application will not generate duplicate IDs.

In response to your comment, here's the difference when using surrogate keys (if I have understood your table structure correctly)

     +----+   +----+   +----+
     |    |   |    |   |    |
     | t1 +---+ t2 +---+ t3 |
     |    |   |    |   |    |
     +----+   +----+   +----+

keys: t1.id    t1.id    t1.id
               t2.id    t2.id
                        t3.id

A change to surrogate keys it would not change t1 (no need); t2 would also remain the same, but you would need to change the t2.id from being a line number to a unique number primary key [pk]. Trouble is you will have to store the 'line number' as another field in the table.

The main difference would be t3 only needing the t2.pk and its own row identifier, rather than all three table identifiers.

      +----+   +----+   +----+
      |    |   |    |   |    |
      | t1 +---+ t2 +---+ t3 |
      |    |   |    |   |    |
      +----+   +----+   +----+

keys: t1.id     t2.pk    t2.pk
                t1.id    t3.pk

Would this be better? I'm not sure. It would also depend in the type, and frequency, of the queries you are running against the tables (see my comment on your question).

If you mainly query by id then the first layout will be the best - lookup rows in t3 without having to join to t2 or t1. But if you need information from those other tables to do your searches, a surrogate key structure may make your joins less verbose.

Ultimately, this is all premature optimisation (in terms of query performance) because you won't know what performs best until you implement it.

Comments