BadHorsie BadHorsie - 6 months ago 38
SQL Question

MySQL - How to normalize column containing delimiter-separated IDs

I'm trying to normalize a table which a previous developer designed to have a column containing pipe-separated IDs which link to other rows in the same table.

Customers Table

id | aliases (VARCHAR)
----------------------------
1 | |4|58|76
2 |
3 |
4 | |1|58|76
... |
58 | |1|4|76
... |
76 | |1|4|58


So customer 1, 4, 58 and 76 are all "aliases" of each other. Customer 2 and 3 have no aliases, so the field contains an empty string.

I want to do away with the entire "alias" system, and normalise the data so I can map those other customers all to the one record. So I want related table data for customer 1, 4, 58, and 76 all to be mapped just to customer 1.

I figured I would populate a new table which later I can then join and perform updates on other tables.

Join Table

id | customer_id | alias_id
-------------------------------
1 | 1 | 4
2 | 1 | 58
3 | 1 | 76


How can I get the data from that first table, into the above format? If this is going to be an absolute nightmare in pure SQL, I will just write a PHP script which attempts to do this work and insert the data.

Answer

When I started to answer this question, I thought it would be quick and easy because I'd done something very similar once in SQL Server, but proving out the concept in translation burgeoned into this full solution.

One caveat that wasn't clear from your question is whether you have a condition for declaring the primary id vs the alias id. For instance, this solution will allow 1 to have an alias of 4 as well as 4 to have an alias of 1, which is consistent with the provided data in your simplified example question.

To setup the data for this example, I used this structure:

CREATE TABLE notnormal_customers (
  id INT NOT NULL PRIMARY KEY,
  aliases VARCHAR(10)
);

INSERT INTO notnormal_customers (id,aliases)
VALUES
(1,'|4|58|76'),
(2,''),
(3,''),
(4,'|1|58|76'),
(58,'|1|4|76'),
(76,'|1|4|58');

First, in order to represent the one-to-many relationship for one-customer to many-aliases, I created this table:

CREATE TABLE customer_aliases (
    primary_id INT NOT NULL,
    alias_id INT NOT NULL,
    FOREIGN KEY (primary_id) REFERENCES notnormal_customers(id),
    FOREIGN KEY (alias_id)   REFERENCES notnormal_customers(id),
    /* clustered primary key prevents duplicates */
    PRIMARY KEY (primary_id,alias_id)
)

Most importantly, we'll use a custom SPLIT_STR function:

CREATE FUNCTION SPLIT_STR(
  x VARCHAR(255),
  delim VARCHAR(12),
  pos INT
)
RETURNS VARCHAR(255)
RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(x, delim, pos),
       LENGTH(SUBSTRING_INDEX(x, delim, pos -1)) + 1),
       delim, '');

Then we'll create a stored procedure to do all the work. Code is annotated with comments to source references.

DELIMITER $$
CREATE PROCEDURE normalize_customers()
BEGIN

  DECLARE cust_id INT DEFAULT 0;
  DECLARE al_id INT UNSIGNED DEFAULT 0;
  DECLARE alias_str VARCHAR(10) DEFAULT '';
  /* set the value of the string delimiter */
  DECLARE string_delim CHAR(1) DEFAULT '|';
  DECLARE count_aliases INT DEFAULT 0;
  DECLARE i INT DEFAULT 1;

  /*
    use cursor to iterate through all customer records
    http://burnignorance.com/mysql-tips/how-to-loop-through-a-result-set-in-mysql-strored-procedure/
  */
  DECLARE done INT DEFAULT 0;
  DECLARE cur CURSOR FOR
      SELECT `id`, `aliases`
      FROM `notnormal_customers`;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

  OPEN cur;
  read_loop: LOOP

    /*
      Fetch one record from CURSOR and set to customer id and alias string.
      If not found then `done` will be set to 1 by continue handler.
    */
    FETCH cur INTO cust_id, alias_str;
    IF done THEN
        /* If done set to 1 then exit the loop, else continue. */
        LEAVE read_loop;
    END IF;

    /* skip to next record if no aliases */
    IF alias_str = '' THEN
      ITERATE read_loop;
    END IF;

    /*
      get number of aliases
      https://pisceansheart.wordpress.com/2008/04/15/count-occurrence-of-character-in-a-string-using-mysql/
    */
    SET count_aliases = LENGTH(alias_str) - LENGTH(REPLACE(alias_str, string_delim, ''));

    /* strip off the first pipe to make it compatible with our SPLIT_STR function */
    SET alias_str = SUBSTR(alias_str, 2);

    /*
      iterate and get each alias from custom split string function
      http://stackoverflow.com/questions/18304857/split-delimited-string-value-into-rows
    */
    WHILE i <= count_aliases DO

      /* get the next alias id */
      SET al_id = CAST(SPLIT_STR(alias_str, string_delim, i) AS UNSIGNED);
      /* REPLACE existing values instead of insert to prevent errors on primary key */
      REPLACE INTO customer_aliases (primary_id,alias_id) VALUES (cust_id,al_id);
      SET i = i+1;

    END WHILE;
    SET i = 1;

  END LOOP;
  CLOSE cur;

END$$
DELIMITER ;

Finally you can simply run it by calling:

CALL normalize_customers();

Then you can check the data in console:

mysql> select * from customer_aliases;
+------------+----------+
| primary_id | alias_id |
+------------+----------+
|          4 |        1 |
|         58 |        1 |
|         76 |        1 |
|          1 |        4 |
|         58 |        4 |
|         76 |        4 |
|          1 |       58 |
|          4 |       58 |
|         76 |       58 |
|          1 |       76 |
|          4 |       76 |
|         58 |       76 |
+------------+----------+
12 rows in set (0.00 sec)