合并在与另一个 table 的关系中使用的重复 table 行

Question

我有以下 table 结构：

table_a
id | customer_id | product_id
---+-------------+------
 1 | c1          | p1
 2 | c1          | p1
 3 | c2          | p1

table_b
id | table_a_id  | attribute
---+-------------+------
 99 | 1          | a1
 98 | 2          | a2
 97 | 3          | a3

如您所见，table_a 有重复值，我想合并它们。不幸的是 table_a PK 也用于 table_b。

最终结果应该是：

table_a
id | customer_id | product_id
---+-------------+------
 1 | c1          | p1
 3 | c2          | p1

table_b
id | table_a_id  | attribute
---+-------------+------
 99 | 1          | a1
 98 | 1          | a2
 97 | 3          | a3

我必须更新 table_b 与 table_a 的关系，然后清除 table_a.

上的所有 unsed 键

不幸的是，我唯一想到的查询真的很重，之前的数据库超时可以完成。 table_a 有 200k+ 条记录，table_b 至少是它的两倍。

我的想法是：

加入 table_a 和 table_b ，得到：(table_b_id, table_a_customer_id, table_a_product_id)
获取 table_a 的分组版本。（为了获得 table_a 的 id 我刚刚使用了 min("id")
inner join 上面两个，用结果更新table_b.

Answer 1

这是一个使用常见 table 表达式的选项：

with 
    ta as (
        select ta.*, min(id) over(partition by customer_id, product_id) min_id
        from table_a ta
    ),
    upd as (
        update table_b tb
        set table_a_id = ta.min_id
        from ta
        where tb.table_a_id = ta.id and ta.id <> ta.min_id
    )
delete from table_a ta1
using ta
where 
    ta1.customer_id = ta.customer_id
    and ta1.product_id = ta.product_id
    and ta1.id > ta.id

第一个 CTE 将目标 id 关联到 table_a 的每一行。然后，我们使用该信息更新 table_b。最后，我们删除 table_a 中的重复行，仅保留最早的 id。

合并在与另一个 table 的关系中使用的重复 table 行

Merge duplicate table rows that are used in a relation with another table

sql

postgresql

duplicates

sql-update

sql-delete