如何通过排序、分区和分组进行行编号

How to make row numbering with ordering, partitioning and grouping

我需要通过排序、分区和分组来进行行编号。按 IdDocument, DateChange 排序,按 IdDocument 分区,按 IdRole 分组。问题尤其在于分组。从示例中可以看出 (NumberingExpected) DENSE_RANK() 必须是用于此目的的最佳函数,但仅当用于排序的值相同时才会重复编号。在我的例子中,用于排序的值 (IdDocument, DateChange) 总是不同的,并且编号的重复必须由 IdRole.

完成

当然可以很容易地通过游标的使用来解决。但是有什么方法可以用 numbering/ranking 函数来实现吗?

测试数据:

declare @LogTest as table (
    Id INT
    ,IdRole INT
    ,DateChange DATETIME
    ,IdDocument INT
    ,NumberingExpected INT
)
insert into @LogTest
select 1 as Id, 7 as IdRole, GETDATE() as DateChange, 13 as IdDocument, 1 as NumberingExpected
union 
select 2, 3, DATEADD(HH, 1, GETDATE()), 13, 2
union 
select 3, 3, DATEADD(HH, 2, GETDATE()), 13, 2
union 
select 4, 3, DATEADD(HH, 3, GETDATE()), 13, 2
union 
select 5, 5, DATEADD(HH, 4, GETDATE()), 13, 3
union 
select 7, 3, DATEADD(HH, 6, GETDATE()), 13, 4
union 
select 6, 3, DATEADD(HH, 5, GETDATE()), 27, 1
union 
select 8, 3, DATEADD(HH, 7, GETDATE()), 27, 1
union 
select 9, 5, DATEADD(HH, 8, GETDATE()), 27, 2
union 
select 10, 3, DATEADD(HH, 9, GETDATE()), 27, 3


select * from @LogTest order by IdDocument, DateChange;

函数式编程方面的解释:

  1. 按 IdDocument、DateChange 排序数据
  2. 将第一行编号设置为 i=1 转到下一行
  3. 如果 IdDocument 已更改 { 我 = 1; } 别的 { 如果 IdRow 改变了 { i++; } }
  4. 设置行号为 i;
  5. 转到下一行;
  6. IF EOF { 退出; } else { 转到第 3 步; }
WITH RankByIdDocumentAndDataChanged AS
(
    SELECT *, 
        CASE 
             IdRole - LAG(IdRole) OVER (PARTITION BY IdDocument ORDER BY DateChange) 
             WHEN 0 THEN 0 
             ELSE 1 
        END AS DIFF
    FROM @LogTest
)
select *, SUM(DIFF) OVER (PARTITION BY IdDocument ORDER BY DateChange)
from RankByIdDocumentAndDataChanged 
ORDER BY Id

这可能不太漂亮,但它确实创建了所需的输出。

; with cte as (
    select l.Id,l.IdRole,l.IdDocument,l.NumberingExpected,l.DateChange,
    (select min(x.DateChange) from @LogTest x where x.IdDocument = l.IdDocument and x.IdRole = l.IdRole and x.id<=l.id and 
        x.id > (select max(y.id) from @LogTest y where y.IdDocument = l.IdDocument and y.IdRole <> l.IdRole and y.id <=l.Id)) as DateChange2
    from @LogTest l
)
select c.Id,c.IdRole,c.DateChange,c.IdDocument,c.NumberingExpected,dense_rank() over (partition by c.IdDocument order by c.DateChange2) as rn
from cte c order by c.IdDocument, c.DateChange;

如果我有更多时间,我认为 CTE 中的 x.id 谓词可以改进。

自 2012 年起您可以使用 LAG/LEAD,但在 2008 年它不可用,因此我们将效仿它。性能可能很差,你应该检查你的实际数据。

这是最终查询:

WITH
CTE_rn
AS
(
    SELECT
        Main.IdRole
        ,Main.IdDocument
        ,Main.DateChange
        ,ROW_NUMBER() OVER(PARTITION BY Main.IdDocument ORDER BY Main.DateChange) AS rn
    FROM
        @LogTest AS Main
        OUTER APPLY
        (
            SELECT TOP (1) T.IdRole
            FROM @LogTest AS T
            WHERE
                T.IdDocument = Main.IdDocument
                AND T.DateChange < Main.DateChange
            ORDER BY T.DateChange DESC
        ) AS Prev
    WHERE Main.IdRole <> Prev.IdRole OR Prev.IdRole IS NULL
)
SELECT *
FROM
    @LogTest AS LT
    CROSS APPLY
    (
        SELECT TOP(1) CTE_rn.rn
        FROM CTE_rn
        WHERE
            CTE_rn.IdDocument = LT.IdDocument
            AND CTE_rn.IdRole = LT.IdRole
            AND CTE_rn.DateChange <= LT.DateChange
        ORDER BY CTE_rn.DateChange DESC
    ) CA_rn
ORDER BY IdDocument, DateChange;

最终结果集:

Id    IdRole    DateChange                 IdDocument    NumberingExpected    rn
1     7         2015-01-26 20:00:41.210    13            1                    1
2     3         2015-01-26 21:00:41.210    13            2                    2
3     3         2015-01-26 22:00:41.210    13            2                    2
4     3         2015-01-26 23:00:41.210    13            2                    2
5     5         2015-01-27 00:00:41.210    13            3                    3
7     3         2015-01-27 02:00:41.210    13            4                    4
6     3         2015-01-27 01:00:41.210    27            1                    1
8     3         2015-01-27 03:00:41.210    27            1                    1
9     5         2015-01-27 04:00:41.210    27            2                    2
10    3         2015-01-27 05:00:41.210    27            3                    3

工作原理

1) 当 table 按 IdDocument 和 DateChange 排序时,我们需要上一行的 IdRole 值。要获得它,我们使用 OUTER APPLY(因为 LAG 不可用):

SELECT *
FROM
    @LogTest AS Main
    OUTER APPLY
    (
        SELECT TOP (1) T.IdRole
        FROM @LogTest AS T
        WHERE
            T.IdDocument = Main.IdDocument
            AND T.DateChange < Main.DateChange
        ORDER BY T.DateChange DESC
    ) AS Prev
ORDER BY Main.IdDocument, Main.DateChange;

这是第一步的结果集:

Id    IdRole    DateChange                 IdDocument    NumberingExpected    IdRole
1     7         2015-01-26 20:50:32.560    13            1                    NULL
2     3         2015-01-26 21:50:32.560    13            2                    7
3     3         2015-01-26 22:50:32.560    13            2                    3
4     3         2015-01-26 23:50:32.560    13            2                    3
5     5         2015-01-27 00:50:32.560    13            3                    3
7     3         2015-01-27 02:50:32.560    13            4                    5
6     3         2015-01-27 01:50:32.560    27            1                    NULL
8     3         2015-01-27 03:50:32.560    27            1                    3
9     5         2015-01-27 04:50:32.560    27            2                    3
10    3         2015-01-27 05:50:32.560    27            3                    5

2) 我们想要删除具有重复 IdRole 的行,因此我们添加一个 WHERE 并对行进行编号。您可以看到行号符合预期结果:

SELECT
    Main.IdRole
    ,Main.IdDocument
    ,Main.DateChange
    ,ROW_NUMBER() OVER(PARTITION BY Main.IdDocument ORDER BY Main.DateChange) AS rn
FROM
    @LogTest AS Main
    OUTER APPLY
    (
        SELECT TOP (1) T.IdRole
        FROM @LogTest AS T
        WHERE
            T.IdDocument = Main.IdDocument
            AND T.DateChange < Main.DateChange
        ORDER BY T.DateChange DESC
    ) AS Prev
WHERE Main.IdRole <> Prev.IdRole OR Prev.IdRole IS NULL
;

这是这一步的结果集(它成为 CTE):

IdRole    IdDocument    DateChange                 rn
7         13            2015-01-26 20:13:26.247    1
3         13            2015-01-26 21:13:26.247    2
5         13            2015-01-27 00:13:26.247    3
3         13            2015-01-27 02:13:26.247    4
3         27            2015-01-27 01:13:26.247    1
5         27            2015-01-27 04:13:26.247    2
3         27            2015-01-27 05:13:26.247    3

3) 最后,我们需要从 CTE 中为原始 table 的每一行获取正确的行号。我使用 CROSS APPLY 为原始 table.

的每一行从 CTE 获取一行