更新到 MySQL，但使用多列和唯一索引作为重复检查？

Question

我看到很多人问过upserting (this, this, this, this, this, this, and more and even the official doc)。

然而，对于新手来说，解释得不够好，就是如何使用主键或唯一索引创建重复键。

我需要什么:
如果 table1 的 3 列的唯一组合 (attributeId, entityId, carId) 在 table2 中有重复项，则更新 value 列。否则将 table1 的行插入 table2。

attributeId, entityId, carId 组合对于每一行都是唯一的。
即：如果一行的列为 1,2,5，则其他行将没有 1,2,5。但另一行可能有 5,1,2 或 3,4,2 等

这里的难题在于创建唯一索引。这样做就足够了吗：

CREATE INDEX PIndex ON table1 (attributeId, entityId, carId);

或者是否有必要删除所有其他索引，然后创建此索引，然后运行这样的查询？（下面的伪代码）：

    INSERT INTO table1 (attributeId, entityId, carId, value, name) 
    VALUES (table2.attributeId,table2.entityId,table2.carId,table2.value,table2.name) 
ON DUPLICATE KEY UPDATE value=VALUES(value);

基本逻辑为：
如果对于 table2 中的一行，在 table1 中有对应的行具有完全相同的 attributeId、entityId 和 carId 值，然后更新 table1 中的 value 列table2 中 value 列的值。如果没有对应的行，则取table2行追加到table1.

Answer 1

您可以使用语法

ALTER IGNORE TABLE table1 ADD UNIQUE INDEX PIndex (attributeId, entityId, carId);

根据the documentation：

If IGNORE is specified, only one row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.

遗憾的是，它没有指定将保留哪个值。做一些测试似乎就像它保持第一次出现一样，但你永远无法确定。

如果您不介意删除哪个条目，这是最简单的解决方案，否则如果您想要更多控制，最好通过临时 table.

命令 CREATE UNIQUE PIndex ON table1 (attributeId, entityId, carId);（注意添加的 UNIQUE）将在第一个重复键上简单地失败，并且没有管理重复项的选项可用。

Answer 2

似乎该规范适用于两种不同的操作：1) table1 中现有行的更新，以及 2) 将新行插入 table2.

规范说“更新值列”...我们认为这意味着更新 行中的值列table1.

规范还说“将...插入table2.

令人困惑的是，规范还显示了一个示例伪代码 INSERT INTO table1。

根据[中的值执行table1的更新=13=]，假设我们要忽略在三列中任何一列中具有 NULL 值的行...

 UPDATE table1 t
   JOIN table2 s
     ON t.attributeid = s.attributeid
    AND t.entityid    = s.entityid
    AND t.carid       = s.carid
    SET t.value  = s.value

如果table2中有"duplicates"（即[中有多行） =57=]table2 attributeid, entityid, carid 三列的值相同，就是indeterminate 那些行中的哪一行value 将取自。

插入在 table2 中找到但在 table1 中找到 "missing" 的行（再次假设这三列在 table2 中可能不是唯一的） , 我们可以使用反连接模式来消除 table1.

中已经有 "match" 的行

例如：

 INSERT INTO table1 (attributeid, entityid, carid, value)
 SELECT v.*
   FROM ( SELECT s.attribute_id
               , s.entity_id
               , s.carid
               , s.value
            FROM table2 s
            LEFT
            JOIN table1 r
              ON r.attributeid = s.attributeid
             AND r.entityid    = s.entityid
             AND r.carid       = s.carid
           WHERE r.attributeid IS NULL
             AND s.attributeid IS NOT NULL
             AND s.entityid    IS NOT NULL
             AND s.carid       IS NOT NULL
           GROUP
              BY s.attributeid
               , s.entityid
               , s.carid
        ) v

如果table2中有"duplicates"（即table2中多行attributeid,entityid,carid三列的值相同，则为不确定将从哪一行value中取出。

如果在其他列或列组合上定义了其他 UNIQUE 约束，该语句可能会引发 "duplicate key" 错误。（在不知道键定义的情况下，我们有点盲目。）如果我们希望语句成功，我们可以添加 IGNORE 关键字，只是忽略由于 "unique key" 违规而无法插入的行。）

同样，如果 table2 中的行在三列中具有相同的值（没有指示是鉴于此列组合在 table2) 中是唯一的，因此不确定将从 value 中的哪一行中获取。

可以在相反的方向执行相同的操作，交换查询中出现的所有 table 引用 table1 和 table2。

无需向 table 中的任何一个添加 UNIQUE KEY 即可执行这些操作。定义 suitable 索引（将这三列作为索引中的前导（第一）列）（可能）会带来性能优势。（这不一定需要是此操作的 UNIQUE 索引。）

如果该列组合应该是唯一的，那么一定要在该列组合上添加一个 UNIQUE KEY。但是可以在没有定义 UNIQUE KEY 的情况下执行指定的操作。

MySQL INSERT ... ON DUPLICATE KEY 语法需要至少一个 PRIMARY KEY 或 UNIQUE KEY 才能操作。如果目标 table 上有多个 UNIQUE KEY 约束，并且 INSERT 会违反两个或多个唯一键约束，我相信在 UPDATE 操作中将使用这些键中的哪个是不确定的。就个人而言，我倾向于避免在定义了多个 UNIQUE KEY 的 table 上使用该语法。

更新到 MySQL，但使用多列和唯一索引作为重复检查？

Upserting to MySQL, but with multiple columns and unique index as duplicate check?

mysql

indexing

upsert

database-indexes

on-duplicate-key