a paid nerd a paid nerd - 3 months ago 15
SQL Question

Design question: Filterable attributes, SQL

I have two tables in my database,

Operation
and
Equipment
. An operation requires zero or more attributes. However, there's some logic in how the attributes are attributed:


  • Operation
    Foo
    requires equipment
    A
    and
    B

  • Operation
    Bar
    requires no equipment

  • Operation
    Baz
    requires equipment
    B
    and either
    C
    or
    D

  • Operation
    Quux
    requires equipment (
    A
    or
    B
    ) and (
    C
    or
    D
    )



What's the best way to represent this in SQL?

I'm sure people have done this before, but I have no idea where to start.

(FWIW, my application is built with Python and Django.)

Update 1: There will be around a thousand
Operation
rows and about thirty
Equipment
rows. The information is coming in CSV form similar to the description above:
Quux, (A & B) | (C & D)


Update 2: The level of conjunctions & disjunctions shouldn't be too deep. The
Quux
example is probably the most complicated, though there appears to be a
A | (D & E & F)
case.

Answer

Think about how you'd model the operations in OO design: the operations would be subclasss of a common superclass Operation. Each subclass would have mandatory object members for the respective equipment required by that operation.

The way to model this with SQL is Class Table Inheritance. Create a common super-table:

CREATE TABLE Operation (
  operation_id   SERIAL PRIMARY KEY,
  operation_type CHAR(1) NOT NULL,
  UNIQUE KEY (operation_id, operation_type),
  FOREIGN KEY (operation_type) REFERENCES OperationTypes(operation_type)
);

Then for each operation type, define a sub-table with a column for each required equipment type. For example, OperationFoo has a column for each of equipA and equipB. Since they are both required, the columns are NOT NULL. Constrain them to the correct types by creating a Class Table Inheritance super-table for equipment too.

CREATE TABLE OperationFoo (
  operation_id   INT PRIMARY KEY,
  operation_type CHAR(1) NOT NULL CHECK (operation_type = 'F'),
  equipA         INT NOT NULL,
  equipB         INT NOT NULL,
  FOREIGN KEY (operation_id, operation_type) 
      REFERENCES Operations(operation_d, operation_type),
  FOREIGN KEY (equipA) REFERENCES EquipmentA(equip_id),
  FOREIGN KEY (equipB) REFERENCES EquipmentB(equip_id)
);

Table OperationBar requires no equipment, so it has no equip columns:

CREATE TABLE OperationBar (
  operation_id   INT PRIMARY KEY,
  operation_type CHAR(1) NOT NULL CHECK (operation_type = 'B'),
  FOREIGN KEY (operation_id, operation_type) 
      REFERENCES Operations(operation_d, operation_type)
);

Table OperationBaz has one required equipment equipA, and then at least one of equipB and equipC must be NOT NULL. Use a CHECK constraint for this:

CREATE TABLE OperationBaz (
  operation_id   INT PRIMARY KEY,
  operation_type CHAR(1) NOT NULL CHECK (operation_type = 'Z'),
  equipA         INT NOT NULL,
  equipB         INT,
  equipC         INT,
  FOREIGN KEY (operation_id, operation_type) 
      REFERENCES Operations(operation_d, operation_type)
  FOREIGN KEY (equipA) REFERENCES EquipmentA(equip_id),
  FOREIGN KEY (equipB) REFERENCES EquipmentB(equip_id),
  FOREIGN KEY (equipC) REFERENCES EquipmentC(equip_id),
  CHECK (COALESCE(equipB, equipC) IS NOT NULL)
);

Likewise in table OperationQuux you can use a CHECK constraint to make sure at least one equipment resource of each pair is non-null:

CREATE TABLE OperationQuux (
  operation_id   INT PRIMARY KEY,
  operation_type CHAR(1) NOT NULL CHECK (operation_type = 'Q'),
  equipA         INT,
  equipB         INT,
  equipC         INT,
  equipD         INT,
  FOREIGN KEY (operation_id, operation_type) 
      REFERENCES Operations(operation_d, operation_type),
  FOREIGN KEY (equipA) REFERENCES EquipmentA(equip_id),
  FOREIGN KEY (equipB) REFERENCES EquipmentB(equip_id),
  FOREIGN KEY (equipC) REFERENCES EquipmentC(equip_id),
  FOREIGN KEY (equipD) REFERENCES EquipmentD(equip_id),
  CHECK (COALESCE(equipA, equipB) IS NOT NULL AND COALESCE(equipC, equipD) IS NOT NULL)
);

This may seem like a lot of work. But you asked how to do it in SQL. The best way to do it in SQL is to use declarative constraints to model your business rules. Obviously, this requires that you create a new sub-table every time you create a new operation type. This is best when the operations and business rules never (or hardly ever) change. But this may not fit your project requirements. Most people say, "but I need a solution that doesn't require schema alterations."

Most developers probably don't do Class Table Inheritance. More commonly, they just use a one-to-many table structure like other people have mentioned, and implement the business rules solely in application code. That is, your application contains the code to insert only the equipment appropriate for each operation type.

The problem with relying on the app logic is that it can contain bugs and might insert data the doesn't satisfy the business rules. The advantage of Class Table Inheritance is that with well-designed constraints, the RDBMS enforces data integrity consistently. You have assurance that the database literally can't store incorrect data.

But this can also be limiting, for instance if your business rules change and you need to adjust the data. The common solution in this case is to write a script to dump all the data out, change your schema, and then reload the data in the form that is now allowed (Extract, Transform, and Load = ETL).

So you have to decide: do you want to code this in the app layer, or the database schema layer? There are legitimate reasons to use either strategy, but it's going to be complex either way.


Re your comment: You seem to be talking about storing expressions as strings in data fields. I recommend against doing that. The database is for storing data, not code. You can do some limited logic in constraints or triggers, but code belongs in your application.

If you have too many operations to model in separate tables, then model it in application code. Storing expressions in data columns and expecting SQL to use them for evaluating queries would be like designing an application around heavy use of eval().