Marcus Frölander Marcus Frölander - 5 months ago 11
SQL Question

A more effective way to structure database for pivots

To describe the question as good as possible, I'll detail the actual scenario:
The site will allow users to add IP's for monitoring them against RBL's. In doing so, I've been pondering different approaches to structuring the database in a manner that is as optimal as possible in terms of efficiency for large IP blocks, whilst still being workable.

The project is built on Laravel, and I've set up a database structure containing:


table_a

Contains information about the IP we monitor.


  • id (auto inc, primary)

  • name (varchar, 128) Friendly name for the monitor

  • ip (varchar, 16) The IP to monitor

  • email (varchar, 128) An e-mail for notifications

  • notifications (tinyint, 1) A toggle for notifications

  • timestamps




-


table_b

Contains information about the RBL's we monitor against.


  • id (auto inc, primary)

  • url (varchar, 255) The URL for the monitor

  • active (tinyint, 1) Toggle for whether we actively check toward it or not




-


table_a_b

A pivot table to maintain the status of each RBL in.


  • table_a_id (int, 10) Foreign key to id on table_a

  • table_b_id (int, 10) Foreign key to id on table_b

  • listed (tinyint, 1) Whether or not the IP is listed on this RBL

  • notified (tinyint, 1) Whether or not we've already notified the user




So the way it currently works is that when adding an IP, it adds the IP into table_a, as well as
(number of rows in table_b * number of IP's)
into the pivot table. Whilst adding it doesn't take that long - the issue I see is that adding a /24 of IP's (256 IP's) tracked against (currently) 87 RBL's creates a grand total of 22,272 records. That's for a /24. A /22 (1024 IP's) would be 89,088 records. That's a pretty large amount for a single user, and I can see how that'd quickly destroy database performance.

The alternative I've thought of is to keep a column named
listed_on
on
table_a
which is a list of any RBL they are listed in. The row would contain something along the line of
55|32|11
- which would be simple enough to parse in PHP. Still, with a large number of users, I can see that hitting PHP performance instead with a large amount of string handling.

Am I missing an obvious solution here, or are these two ones (perhaps especially the latter) the best bet?

Cheers!

Answer

The straigtforward relational design is (instead of table_a_b) tables listed(ip, rbl) "ip IP is listed on rbl RBL" and notified(ip, rbl) "user has been notifed that ip IP is listed on rbl RBL". Relational tables are intended to hold the rows that make a true proposition (statement) from some table-specific predicate (sentence template parameterized by columns). Tables that should contain a row for every possible value for a column or columns are rare.

Whether instead a design should be used that combines these tables depends on what your predicates are and the constraints that follow from the them and the situations that can arise. Eg if notifications only ever happen to listed IP-RBL pairs then maybe best is url_ibl(ip, rbl, notified) "ip IP is in rbl RBL and NOTIFIED is whether user has been notified". (Here the tradeoff is more smaller tables & more joins vs fewer larger tables & more searching.)