使用 SQL 服务器的 3 列间隙和孤岛失败
Gaps and island fails with 3 columns using SQL Server
我在间隙和孤岛解决方案中遇到了一个奇怪的行为。对于 3 列(第 3 列为非整数),结果实际上是随机的。假设我们有以下查询:
Declare @Table1 TABLE
(
ID varchar(50),
yr float,
CO1 varchar(50)
);
INSERT INTO @Table1 (ID, yr, CO1)
VALUES ('I2','2011','ABE'), ('I2','2012','ABE'), ('I2','2013','ABE'),
('I2','2014','ABE'), ('I2','2014','ABE'), ('I2','2005','ABD'),
('I2','2006','ABD'), ('I2','2007','ABD'), ('I2','2008','ABD'),
('I2','2007','ABA CD'), ('I2','2011','ABA CD'), ('I2','2013','ABA CD');
SELECT
ID, CO1, StartSeqNo = MIN(yr), EndSeqNo = MAX(yr)
FROM
(SELECT
ID, yr, CO1,
rn = yr - ROW_NUMBER() OVER (PARTITION BY ID ORDER BY yr)
FROM
@Table1) a
GROUP BY
ID, CO1, rn ;
我的目标是:
ID CO1 StartSeqNo EndSeqNo
----------------------------
I2 ABA CD 2007 2007
I2 ABA CD 2011 2011
I2 ABA CD 2013 2013
I2 ABD 2005 2008
I2 ABE 2011 2014
我查看了 Whosebug 和其他地方以确定我是否遗漏了什么。我已经尝试使用 distinct 和 dense_rank,两者都没有给出正确的结果
这是我已经尝试过的不同 dense_rank 查询:
--- distinct
SELECT distinct ID,CO1, StartSeqNo=MIN(yr), EndSeqNo=MAX(yr)
FROM (
SELECT distinct ID, yr, CO1
,rn=yr-ROW_NUMBER() OVER (PARTITION BY ID ORDER BY yr)
FROM @Table1) a
GROUP BY ID, CO1, rn ;
--- with dense_rank
SELECT ID,CO1, StartSeqNo=MIN(yr), EndSeqNo=MAX(yr)
FROM (
SELECT ID, yr, CO1
,rn=yr-dense_rank() OVER (PARTITION BY ID ORDER BY yr)
FROM @Table1) a
GROUP BY ID, CO1, rn ;
我不明白为什么间隙和孤岛查询不适用于非整数列。我认为在某处分组存在问题。请帮我解决一下这个。
辛
您似乎想要:
select id, co1, min(yr), max(yr)
from (select *, (case when max(grp) over(partition by co1) > 1 then grp else 1 end) as grp1
from (select *, yr - lag(yr, 1, yr) over (partition by id, co1 order by yr) as grp
from table
) t
) t
group by id, co1, grp1;
您需要 DENSE_RANK
,因为您有多个行具有相同的 ID/yr 组合,您需要将 CO1
添加到 PARTITION BY
:
SELECT
ID, CO1, StartSeqNo = MIN(yr), EndSeqNo = MAX(yr)
FROM
(SELECT
ID, yr, CO1,
rn = yr - dense_rank() OVER (PARTITION BY ID, CO1 ORDER BY yr)
FROM
@Table1) a
GROUP BY
ID, CO1, rn ;
如果没有间隔,年份将是每个 ID/CO1 组中的连续编号,您可以将其与无间隔编号进行比较,后者当然对于每个 ID/CO1 订购的 ID/CO1 也必须是连续的年。所以,如果你不ORDER BY CO1(year之前),你也必须在行编号函数中使用CO1 to PARTITION BY。
此外,您的数据包含重复行,因此要在 ID/CO1 组中给出相同数字的相等年份,请使用 RANK 函数而不是 ROW_NUMBER:
WITH a (ID, CO1, yr, nmbr) AS (
SELECT ID, CO1, yr
, yr - RANK() OVER (PARTITION BY ID, CO1 ORDER BY yr)
FROM @Table1
)
SELECT ID, CO1, StartSeqNo = MIN(yr), EndSeqNo = MAX(yr)
FROM a
GROUP BY ID, CO1, nmbr;
最后让我建议使用 int 而不是 float 作为年份数字。
我在间隙和孤岛解决方案中遇到了一个奇怪的行为。对于 3 列(第 3 列为非整数),结果实际上是随机的。假设我们有以下查询:
Declare @Table1 TABLE
(
ID varchar(50),
yr float,
CO1 varchar(50)
);
INSERT INTO @Table1 (ID, yr, CO1)
VALUES ('I2','2011','ABE'), ('I2','2012','ABE'), ('I2','2013','ABE'),
('I2','2014','ABE'), ('I2','2014','ABE'), ('I2','2005','ABD'),
('I2','2006','ABD'), ('I2','2007','ABD'), ('I2','2008','ABD'),
('I2','2007','ABA CD'), ('I2','2011','ABA CD'), ('I2','2013','ABA CD');
SELECT
ID, CO1, StartSeqNo = MIN(yr), EndSeqNo = MAX(yr)
FROM
(SELECT
ID, yr, CO1,
rn = yr - ROW_NUMBER() OVER (PARTITION BY ID ORDER BY yr)
FROM
@Table1) a
GROUP BY
ID, CO1, rn ;
我的目标是:
ID CO1 StartSeqNo EndSeqNo
----------------------------
I2 ABA CD 2007 2007
I2 ABA CD 2011 2011
I2 ABA CD 2013 2013
I2 ABD 2005 2008
I2 ABE 2011 2014
我查看了 Whosebug 和其他地方以确定我是否遗漏了什么。我已经尝试使用 distinct 和 dense_rank,两者都没有给出正确的结果
这是我已经尝试过的不同 dense_rank 查询:
--- distinct
SELECT distinct ID,CO1, StartSeqNo=MIN(yr), EndSeqNo=MAX(yr)
FROM (
SELECT distinct ID, yr, CO1
,rn=yr-ROW_NUMBER() OVER (PARTITION BY ID ORDER BY yr)
FROM @Table1) a
GROUP BY ID, CO1, rn ;
--- with dense_rank
SELECT ID,CO1, StartSeqNo=MIN(yr), EndSeqNo=MAX(yr)
FROM (
SELECT ID, yr, CO1
,rn=yr-dense_rank() OVER (PARTITION BY ID ORDER BY yr)
FROM @Table1) a
GROUP BY ID, CO1, rn ;
我不明白为什么间隙和孤岛查询不适用于非整数列。我认为在某处分组存在问题。请帮我解决一下这个。
辛
您似乎想要:
select id, co1, min(yr), max(yr)
from (select *, (case when max(grp) over(partition by co1) > 1 then grp else 1 end) as grp1
from (select *, yr - lag(yr, 1, yr) over (partition by id, co1 order by yr) as grp
from table
) t
) t
group by id, co1, grp1;
您需要 DENSE_RANK
,因为您有多个行具有相同的 ID/yr 组合,您需要将 CO1
添加到 PARTITION BY
:
SELECT
ID, CO1, StartSeqNo = MIN(yr), EndSeqNo = MAX(yr)
FROM
(SELECT
ID, yr, CO1,
rn = yr - dense_rank() OVER (PARTITION BY ID, CO1 ORDER BY yr)
FROM
@Table1) a
GROUP BY
ID, CO1, rn ;
如果没有间隔,年份将是每个 ID/CO1 组中的连续编号,您可以将其与无间隔编号进行比较,后者当然对于每个 ID/CO1 订购的 ID/CO1 也必须是连续的年。所以,如果你不ORDER BY CO1(year之前),你也必须在行编号函数中使用CO1 to PARTITION BY。 此外,您的数据包含重复行,因此要在 ID/CO1 组中给出相同数字的相等年份,请使用 RANK 函数而不是 ROW_NUMBER:
WITH a (ID, CO1, yr, nmbr) AS (
SELECT ID, CO1, yr
, yr - RANK() OVER (PARTITION BY ID, CO1 ORDER BY yr)
FROM @Table1
)
SELECT ID, CO1, StartSeqNo = MIN(yr), EndSeqNo = MAX(yr)
FROM a
GROUP BY ID, CO1, nmbr;
最后让我建议使用 int 而不是 float 作为年份数字。