如何 select 同时 row_number 和计算分区?

How to select both row_number and count over partition?

我需要查找重复记录(包含主记录 ID 和重复记录 ID):

select ciid, name from (
select ciid, name, row_number() over (
  partition by related_id, name order by updatedate desc) rn
) where rn = 1;

这给了我主记录 ID,但它也包括没有重复的记录。

如果我用

select ciid, name from (
select ciid, name, row_number() over (
  partition by related_id, name order by updatedate desc) rn
) where rn > 1;

这会得到所有重复记录,但不是主记录。

我希望我能做这样的事情:

select ciid, name from (
select ciid, name, row_number()  over (
    partition by related_id, name order by updatedate desc
  ) rn, count(*)  over (
    partition by related_id, name order by updatedate desc
  ) cnt
) where rn = 1 and cnt > 1;

但我担心性能,甚至担心它是否真的在做我想做的事情。

如何只获取重复项的主记录?请注意 name 不是唯一列。只有 ciid 是唯一的。

select ciid, name 
from (
select ciid, name,
dense_rank() over (partition by related_id, name order by updatedate desc) rn
from tablename) t
group by ciid,name
having count(distinct rn) > 1;

编辑:要查找重复项,何不直接这样做。

select x.ciid, x.name, x.updatedate
from tablename x join
(
select name, related_id, max(updatedate) as mxdt, count(*)
from tablename
group by name, related_id
having count(*) > 1
) t
on x.updatedate = t.mxdt and x.name = t.name

你可以用 having 做一个 group by 到 select 只有那些 id 有超过一行的相同行号。

我还没有测试过这个(因为我没有真实数据而且懒得创建一些),但看起来这些方面的东西可能会起作用:

with has_duplicates as (
  select related_id, name
  from yourtable
  group by related_id, name
  having count (*) > 1
),
with_dupes as (
  select
    y.ccid, y.name,
    row_number() over (partition by y.related_id, y.name order by y.updatedate desc) rn
  from
    yourtable y,
    has_duplicates d
  where
    y.related_id = d.related_id and
    y.name = d.name
)
select
  ccid, name
from with_dupes
where rn = 1

我最终在我的问题中使用了类似的查询:

select ciid, name from (
select ciid, name, row_number()  over (
    partition by related_id, name order by updatedate desc
  ) rn, count(*)  over (
    partition by related_id, name desc
  ) cnt
) where rn = 1 and cnt > 1;

效果出奇的好。主记录是 rn = 1,重复是 rn > 1。确保 count(*) over (partition ..) 不能有 order by 子句。