如何 select 同时 row_number 和计算分区?
How to select both row_number and count over partition?
我需要查找重复记录(包含主记录 ID 和重复记录 ID):
select ciid, name from (
select ciid, name, row_number() over (
partition by related_id, name order by updatedate desc) rn
) where rn = 1;
这给了我主记录 ID,但它也包括没有重复的记录。
如果我用
select ciid, name from (
select ciid, name, row_number() over (
partition by related_id, name order by updatedate desc) rn
) where rn > 1;
这会得到所有重复记录,但不是主记录。
我希望我能做这样的事情:
select ciid, name from (
select ciid, name, row_number() over (
partition by related_id, name order by updatedate desc
) rn, count(*) over (
partition by related_id, name order by updatedate desc
) cnt
) where rn = 1 and cnt > 1;
但我担心性能,甚至担心它是否真的在做我想做的事情。
如何只获取重复项的主记录?请注意 name
不是唯一列。只有 ciid
是唯一的。
select ciid, name
from (
select ciid, name,
dense_rank() over (partition by related_id, name order by updatedate desc) rn
from tablename) t
group by ciid,name
having count(distinct rn) > 1;
编辑:要查找重复项,何不直接这样做。
select x.ciid, x.name, x.updatedate
from tablename x join
(
select name, related_id, max(updatedate) as mxdt, count(*)
from tablename
group by name, related_id
having count(*) > 1
) t
on x.updatedate = t.mxdt and x.name = t.name
你可以用 having
做一个 group by
到 select 只有那些 id 有超过一行的相同行号。
我还没有测试过这个(因为我没有真实数据而且懒得创建一些),但看起来这些方面的东西可能会起作用:
with has_duplicates as (
select related_id, name
from yourtable
group by related_id, name
having count (*) > 1
),
with_dupes as (
select
y.ccid, y.name,
row_number() over (partition by y.related_id, y.name order by y.updatedate desc) rn
from
yourtable y,
has_duplicates d
where
y.related_id = d.related_id and
y.name = d.name
)
select
ccid, name
from with_dupes
where rn = 1
我最终在我的问题中使用了类似的查询:
select ciid, name from (
select ciid, name, row_number() over (
partition by related_id, name order by updatedate desc
) rn, count(*) over (
partition by related_id, name desc
) cnt
) where rn = 1 and cnt > 1;
效果出奇的好。主记录是 rn = 1,重复是 rn > 1。确保 count(*) over (partition ..)
不能有 order by
子句。
我需要查找重复记录(包含主记录 ID 和重复记录 ID):
select ciid, name from (
select ciid, name, row_number() over (
partition by related_id, name order by updatedate desc) rn
) where rn = 1;
这给了我主记录 ID,但它也包括没有重复的记录。
如果我用
select ciid, name from (
select ciid, name, row_number() over (
partition by related_id, name order by updatedate desc) rn
) where rn > 1;
这会得到所有重复记录,但不是主记录。
我希望我能做这样的事情:
select ciid, name from (
select ciid, name, row_number() over (
partition by related_id, name order by updatedate desc
) rn, count(*) over (
partition by related_id, name order by updatedate desc
) cnt
) where rn = 1 and cnt > 1;
但我担心性能,甚至担心它是否真的在做我想做的事情。
如何只获取重复项的主记录?请注意 name
不是唯一列。只有 ciid
是唯一的。
select ciid, name
from (
select ciid, name,
dense_rank() over (partition by related_id, name order by updatedate desc) rn
from tablename) t
group by ciid,name
having count(distinct rn) > 1;
编辑:要查找重复项,何不直接这样做。
select x.ciid, x.name, x.updatedate
from tablename x join
(
select name, related_id, max(updatedate) as mxdt, count(*)
from tablename
group by name, related_id
having count(*) > 1
) t
on x.updatedate = t.mxdt and x.name = t.name
你可以用 having
做一个 group by
到 select 只有那些 id 有超过一行的相同行号。
我还没有测试过这个(因为我没有真实数据而且懒得创建一些),但看起来这些方面的东西可能会起作用:
with has_duplicates as (
select related_id, name
from yourtable
group by related_id, name
having count (*) > 1
),
with_dupes as (
select
y.ccid, y.name,
row_number() over (partition by y.related_id, y.name order by y.updatedate desc) rn
from
yourtable y,
has_duplicates d
where
y.related_id = d.related_id and
y.name = d.name
)
select
ccid, name
from with_dupes
where rn = 1
我最终在我的问题中使用了类似的查询:
select ciid, name from (
select ciid, name, row_number() over (
partition by related_id, name order by updatedate desc
) rn, count(*) over (
partition by related_id, name desc
) cnt
) where rn = 1 and cnt > 1;
效果出奇的好。主记录是 rn = 1,重复是 rn > 1。确保 count(*) over (partition ..)
不能有 order by
子句。