如何合并 UNIQUE 索引中可能有 NULL 值的两个表？

Question

如何合并（插入和删除孤立行）到 tableA？

tableA:

+---------+--------+----------+-------+
| company | option | category | rates |
+---------+--------+----------+-------+
| a       | f      | null     | 2.5   |
+---------+--------+----------+-------+
| a       | f      | d        | 2     | *
+---------+--------+----------+-------+
| a       | g      | e        | 3     | **
+---------+--------+----------+-------+
| c       | g      | e        | 4     |
+---------+--------+----------+-------+
| d       | f      | d        | 1     |
+---------+--------+----------+-------+

* 表示孤立行*。
** 表示要更改的值 (3 -> 4)。

仅接触 tableB 中的公司（示例中的 a 和 c，不考虑 d）。

tableB:

+---------+--------+----------+-------+
| company | option | category | rates |
+---------+--------+----------+-------+
| a       | f      | null     | 2.5   |
+---------+--------+----------+-------+
| a       | g      | e        | 4     |
+---------+--------+----------+-------+
| c       | g      | e        | 4     |
+---------+--------+----------+-------+

两个表中的 (company, option, category) 上都有一个唯一索引。

期望的结果 tableA:

+---------+--------+----------+-------+
| company | option | category | rates |
+---------+--------+----------+-------+
| a       | f      | null     | 2.5   |
+---------+--------+----------+-------+
| a       | g      | e        | 4     | <-
+---------+--------+----------+-------+
| c       | g      | e        | 4     |
+---------+--------+----------+-------+
| d       | f      | d        | 1     |
+---------+--------+----------+-------+

只删除了第二行 (a,f,d,2) 并且 rates 由 3 更改为 4 for (a,g,e).

这是一个fiddle：https://rextester.com/QUVC30763

我想先用这个删除孤立行：

DELETE from tableA
 USING tableB
 WHERE 
   -- ignore rows with IDs that don't exist in tableB
   tableA.company = tableB.company
   -- ignore rows that have an exact all-column match in tableB
   AND NOT EXISTS 
      (select * from tableB 
      where tableB.company is not distinct from tableA.company 
      AND tableB.option is not distinct from tableA.option 
      AND tableB.category is not distinct from tableA.category );

然后插入：

 INSERT INTO tableA (company, option, category, rates) 
   SELECT company, option, category, rates
   FROM   tableB
 ON CONFLICT (company, option, category) 
 DO update
   set rates= EXCLUDED.rates
 WHERE 
      tableA.rates IS DISTINCT FROM 
      EXCLUDED.rates;

但是 upsert 函数的问题在于它无法处理可为 null 的字段。我必须设置 -1 代替 null 否则该函数将无法知道是否存在重复项。我觉得设置 -1 代替 null 会在未来创建许多解决方法，所以我想尽可能避免这种情况。

注意：我发现INSERT ... ON CONFLICT ... DO UPDATE大概是这样：

但我还没有看到适合我的情况的查询。而且我不确定是否可以使用可为空的字段。因此 问题：
是否有一种干净的方法来合并可为空的字段？

Answer 1

我认为你走在正确的道路上。但是 NULL 与 UNIQUE:

存在设计问题

列 option 和 category 可以是 NULL。在这些情况下 NULL 被认为是相等的。您当前的唯一索引 而不是 将 NULL 值视为相等，因此不会强制执行您的要求。这甚至在您开始合并之前就产生了歧义。 NULL 值不适合您要实现的目标。解决这个问题会产生更多的工作和额外的故障点。考虑使用一个特殊的值而不是 NULL ，一切都会到位。您正在考虑 -1。任何对您的实际数据类型和属性的性质自然有意义的东西。

也就是说，DELETE 有一个额外的、微妙的隐藏问题：它会尝试删除孤立行的次数与 [=22 上的匹配项一样多=] 在 tableB 中。没有任何问题，因为过多的尝试无济于事，但它不必要的昂贵。改为使用 EXISTS 两次：

DELETE FROM tableA a
WHERE  EXISTS (
   SELECT FROM tableB b
   WHERE a.company = b.company
   )
AND    NOT EXISTS (
   SELECT FROM tableB b
   WHERE (a.company, a.option, a.category) IS NOT DISTINCT FROM
         (b.company, b.option, b.category)
   );

如果您坚持使用 NULL 值，将 UPSERT 拆分为 UPDATE，然后是 INSERT ... ON CONFLICT DO NOTHING 将是解决方法。如果您没有对 table 的并发写入，则更简单、更便宜。 ON CONFLICT DO NOTHING 在不指定冲突目标的情况下工作，因此您可以使用多个部分索引来实现您的要求并使其工作。 The manual:

For ON CONFLICT DO NOTHING, it is optional to specify a conflict_target; when omitted, conflicts with all usable constraints (and unique indexes) are handled. For ON CONFLICT DO UPDATE, a conflict_target must be provided.

但是，如果您使用 working UNIQUE 索引或约束修复架构，则您已经拥有的 UPSERT 可以很好地发挥作用。

并确保没有对 table 的并发写入，否则您可能会面临竞争条件和/或死锁，除非您做更多...

如何合并 UNIQUE 索引中可能有 NULL 值的两个表？

How to merge two tables with possible NULL values in the UNIQUE index?

postgresql

null

upsert

unique-index

postgresql-10