Oracle self join 在同一列中的不同值上复制列值

Question

我正在尝试清理一些几乎重复的数据。我正在执行自连接以查找其中除了一列之外所有列都相等的记录，因此我可以找到最好的重复项以从 table 中删除。我运行遇到的问题是，尽管数字记录是正确的，但我只看到其中一个 id 列值一遍又一遍地重复。当我查看与该用户关联的所有值时，id 列值只出现一次重复。

我知道这还不是很清楚，所以希望这会有所帮助。

Id1    ID2    AnotherColumn    AnotherColumn2
---------------------------------------------

1      345       "a"                "bd"
2      345       "a"                "bd"
3      345       "a"                "bd"
4      345       "a"                "bd"
5      345       "a"                "bd"

我想要取回的是您在这个假人中看到的一切 table。我得到的是：

Id1    ID2    AnotherColumn    AnotherColumn2
---------------------------------------------

1      345       "a"                "bd"
1      345       "a"                "bd"
1      345       "a"                "bd"
1      345       "a"                "bd"
1      345       "a"                "bd"

我使用的查询如下所示：

select A.Id1, A.ID2, A.AnotherColumn, A.AnotherColumn2
from dummy_table A, dummy_table B
where A.ID2 = B.ID2
AND A.Id1 <> B.Id1
AND A.AnotherColumn = B.AnotherColumn
AND A.AnotherColumn2 = B.AnotherColumn2

我想知道的是为什么 Id1 的值被复制到其他行而不是实际显示的原始 Id1 值。

我需要一个符合这些条件的 table 中的 ID 列表，因为我必须将它们从包含其他记录的原始 table 中删除符合这些需要保持不变的标准。

Answer 1

我认为这会满足您的要求：

select min(A.id) over (partition by A.ID2, A.AnotherColumn, A.AnotherColumn2) as id,
       A.id2, A.AnotherColumn, A.AnotherColumn2
from dummy_table A;

returns partition by 子句中列组合的最小值 id。

Answer 2

当我运行你的查询时，我得到了 20 行；每个 id1 值 4（与 4 x 5 相同，因为您实际上是在进行交叉连接，只排除 a.id1 = b.id1 的行）。

with dummy_table as (select 1 id1, 345 ID2, 'a' AnotherColumn, 'bd' AnotherColumn2 from dual union all
                     select 2 id1, 345 ID2, 'a' AnotherColumn, 'bd' AnotherColumn2 from dual union all
                     select 3 id1, 345 ID2, 'a' AnotherColumn, 'bd' AnotherColumn2 from dual union all
                     select 4 id1, 345 ID2, 'a' AnotherColumn, 'bd' AnotherColumn2 from dual union all
                     select 5 id1, 345 ID2, 'a' AnotherColumn, 'bd' AnotherColumn2 from dual)
select A.Id1, A.ID2, A.AnotherColumn, A.AnotherColumn2
from dummy_table A, dummy_table B
where A.ID2 = B.ID2
AND A.Id1 <> B.Id1
AND A.AnotherColumn = B.AnotherColumn
AND A.AnotherColumn2 = B.AnotherColumn2
order by 1, 2, 3, 4


       ID1        ID2 ANOTHERCOLUMN ANOTHERCOLUMN2
---------- ---------- ------------- --------------
         1        345 a             bd            
         1        345 a             bd            
         1        345 a             bd            
         1        345 a             bd            
         2        345 a             bd            
         2        345 a             bd            
         2        345 a             bd            
         2        345 a             bd            
         3        345 a             bd            
         3        345 a             bd            
         3        345 a             bd            
         3        345 a             bd            
         4        345 a             bd            
         4        345 a             bd            
         4        345 a             bd            
         4        345 a             bd            
         5        345 a             bd            
         5        345 a             bd            
         5        345 a             bd            
         5        345 a             bd

但是，我想知道你是否在寻找类似的东西：

with dummy_table as (select 1 id1, 345 ID2, 'a' AnotherColumn, 'bd' AnotherColumn2 from dual union all
                     select 2 id1, 345 ID2, 'a' AnotherColumn, 'bd' AnotherColumn2 from dual union all
                     select 3 id1, 345 ID2, 'a' AnotherColumn, 'bd' AnotherColumn2 from dual union all
                     select 4 id1, 345 ID2, 'a' AnotherColumn, 'bd' AnotherColumn2 from dual union all
                     select 5 id1, 345 ID2, 'a' AnotherColumn, 'bd' AnotherColumn2 from dual union all
                     select 6 id1, 345 ID2, 'b' AnotherColumn, 'bd' AnotherColumn2 from dual)
select id1,
       id2,
       anothercolumn,
       anothercolumn2
from   (select id1,
               id2,
               anothercolumn,
               anothercolumn2,
               count(*) over (partition by id2, anothercolumn, anothercolumn2) cnt
        from   dummy_table)
where  cnt > 1;

       ID1        ID2 ANOTHERCOLUMN ANOTHERCOLUMN2
---------- ---------- ------------- --------------
         1        345 a             bd            
         2        345 a             bd            
         3        345 a             bd            
         4        345 a             bd            
         5        345 a             bd

您可能根本不需要分析函数 - 要删除除具有最低 id1 的行之外的所有行，您可以执行以下操作：

delete from dummy_table
where id1 not in (select min(id1) from dummy_table group by id2, anothercolumn, anothercolumn2);

Oracle self join 在同一列中的不同值上复制列值

Oracle self join copying column value over different values in the same column

sql

oracle

join

self-join