Nav Nav - 1 month ago 6
MySQL Question

Upserting to MySQL, but with multiple columns and unique index as duplicate check?

I saw a lot of people have asked about upserting (this, this, this, this, this, this, and more and even the official doc).

However, something that is not explained well enough for newbies to understand is how to create the duplicate key using primary key or unique indexes.

What I need:

If a

table1
's unique combination of 3 columns (
attributeId, entityId, carId
) has a duplicate in
table2
, then update the
value
column. Else take
table1
's row and insert it into
table2
.

The
attributeId, entityId, carId
combination will be unique for every row.

ie: If a row has columns as
1,2,5
, then no other row will have
1,2,5
. But another row might have
5,1,2
or
3,4,2
etc.

The dilemma here is about creating the unique index. Is it sufficient to just do it like this:

CREATE INDEX PIndex ON table1 (attributeId, entityId, carId);


or is it necessary to delete all other indexes and then create this index and then run a query like this? (pseudocode below):

INSERT INTO table1 (attributeId, entityId, carId, value, name)
VALUES (table2.attributeId,table2.entityId,table2.carId,table2.value,table2.name)
ON DUPLICATE KEY UPDATE value=VALUES(value);


The basic logic being:

If for a row in
table2
, there is a corresponding row in
table1
with exactly the same values for attributeId, entityId and carId, then update the
value
column in
table1
with the value of the
value
column in
table2
. If there is no corresponding row, then take the row of
table2
and append it to
table1
.

Answer

Seems like the specification is for two different operations: 1) an UPDATE of existing rows in table1, and 2) an INSERT of new rows into table2.

The specification says "update the value column"... we take that to mean update the value column in the row of table1.

The specification also says "insert ... into table2.

Confusingly, the specification also shows an example pseudo-code INSERT INTO table1.


To perform an UPDATE of table1 based on values in table2, assuming we are going to ignore rows that have a NULL value in any of the three columns...

 UPDATE table1 t
   JOIN table2 s
     ON t.attributeid = s.attributeid
    AND t.entityid    = s.entityid
    AND t.carid       = s.carid
    SET t.value  = s.value 

If there are "duplicates" in table2 (i.e. multiple rows in table2 with the same values of the three columns attributeid, entityid and carid, it is indeterminate which of those rows value will be taken from.


To insert a row that is found in table2 but "missing" from table1 (again assuming those three columns may not be unique in table2), we can use an anti-join pattern to eliminate rows which already have a "match" in table1.

For example:

 INSERT INTO table1 (attributeid, entityid, carid, value)
 SELECT v.*
   FROM ( SELECT s.attribute_id
               , s.entity_id
               , s.carid
               , s.value
            FROM table2 s
            LEFT
            JOIN table1 r
              ON r.attributeid = s.attributeid
             AND r.entityid    = s.entityid
             AND r.carid       = s.carid
           WHERE r.attributeid IS NULL
             AND s.attributeid IS NOT NULL
             AND s.entityid    IS NOT NULL
             AND s.carid       IS NOT NULL
           GROUP
              BY s.attributeid
               , s.entityid
               , s.carid
        ) v

If there are "duplicates" in table2 (i.e. multiple rows in table2 with the same values of the three columns attributeid, entityid and carid, it is indeterminate which row value will be taken from.

If there are other UNIQUE constraints defined on other columns, or combinations of columns, the statement has a potential to throw a "duplicate key" error. (Without knowing the key definitions, we're kinda flying blind.) We could add the IGNORE keyword if we want the statement to succeed, just ignoring rows that fail to insert due to "unique key" violations.)

Again, if there are rows in table2 with the same values in the the three columns (no indication is given that this combination of columns is unique in table2), it's indeterminate which of those rows value will be taken from.

The same operations can be performed in the opposite direction, swapping all occurrences of the table references table1 and table2 in the queries.


It's not necessary to add a UNIQUE KEY to either of the tables to perform these operations. There would (likely) be a performance benefit to having a suitable index defined, with those three columns as the leading (first) columns in the index. (That doesn't necessarily need to be a UNIQUE index for this operation.)

If that combination of columns should be unique, then by all means add a UNIQUE KEY on that combination of columns. But the specified operations can be performed without a UNIQUE KEY defined.

The MySQL INSERT ... ON DUPLICATE KEY syntax does require at least one PRIMARY KEY or UNIQUE KEY to operate. If there are multiple UNIQUE KEY constraints on the target table, and an INSERT would violate two or more of the unique key constraints, I believe it's indeterminate which of those keys will be used in the UPDATE action. Personally, I'd tend to steer clear of using that syntax on a table with more than one UNIQUE KEY defined.

Comments