Postgres，获取两个链接到相同ID的行值

Question

我遇到了一个相当棘手的数据库问题，这确实难倒了我，希望得到任何帮助。

我有一个 table，其中包含来自多个不同来源的数据。这些来自不同来源的数据可能会“重复”，我们有办法确定是否是这种情况。

table 中的每一行都有一个“id”，如果它被识别为与另一行重复，那么我们合并它，并给它一个“merged_into_id”，它指的是到同一 table.

中的另一行

我正在尝试运行一份报告，该报告将 return 有关我们从其中两个不同来源识别出重复项的信息。

假设我有三个来源：A、B 和 C。我想识别来源 A 和来源 B 之间的所有重复行。

如果源 A 中的一行直接合并到源 B 中，我的查询工作正常。但是，我们在数据库中也有源 A 行和源 B 行合并到源 C 中的实例.我正在为这些而苦苦挣扎，希望有人能帮忙。

一个例子：

原始数据库：

id	source	merged_into_id
1	A	3
2	B	3
3	C	NULL

我想做的是能够从 table 中 return id 1 和 id 2，因为它们都合并到同一个 ID 中，例如像这样：

source_a_id	source_b_id
1	2

但我真的很难做到这一点 - 我所做的就是创建一个父子 link，如下所示：

parent_id	child_id	child_source
3	1	A
3	2	B

我也可以 return 我想要的 ID，但他们 'join' 可以这么说：例如

SELECT 
    CASE WHEN child_source = 'A' then child_id as source_a_id,
    CASE WHEN child_source = 'B' then child_id as source_b_id

但这只是给了我一个空行的响应 'missing' 数据

---编辑---

使用 array_agg 和 array_to_string 我已经更接近我需要的东西了：

SELECT 
    parent.id as parent_id,
        ARRAY_TO_STRING(
        ARRAY_AGG(CASE WHEN child_source = 'A' THEN child.id END)
        , ','
    ) a_id,
    ARRAY_TO_STRING(
        ARRAY_AGG(CASE WHEN child_source = 'B' THEN child.id END)
        , ','
    ) b_id

但它的格式不太正确，因为我偶尔可以从每个来源获得多个版本，所以我得到一个 table 看起来像 :

parent_id	a_id	b_id
3	1	2,4,5

在这种情况下，我想 return 一个 table 看起来像：

parent_id	a_id	b_id
3	1	2
3	1	4
3	1	5

有没有人对获得我想要的输出有任何建议？非常感谢

Answer 1

假设我们有这个table

 select * from t;
 id | source | merged_into_id 
----+--------+----------------
  1 | A      |              3
  2 | B      |              3
  3 | C      |               
  5 | B      |              3
  4 | B      |              3
(5 rows)

这应该可以完成工作

WITH B_source as (select * from t where source = 'B'),
     A_source as (select * from t where source = 'A')
SELECT merged_into_id,A_source.id as a_id,B_source.id as b_id 
FROM A_source 
INNER JOIN B_source using (merged_into_id);

结果

 merged_into_id | a_id | b_id 
----------------+------+------
              3 |    1 |    2
              3 |    1 |    5
              3 |    1 |    4
(3 rows)

Postgres，获取两个链接到相同ID的行值

Postgres, get two row values that are both linked to the same ID

postgresql

join

relationship