如何在 SQL 中创建一个迭代器来对行进行计数,就好像它们在一个集合中一样?

How can one create an iterator in SQL that counts through rows as if they are in a set?

我一直在寻找在单个 UPDATE 语句中执行此操作的方法,但没有成功。

这是我正在使用的数据集的示例:

+-------------------------+----------+--------------+----+--------+
|        TIMESTAMP        | USERNAME |    VALUE     | ID | IsDupe |
+-------------------------+----------+--------------+----+--------+
| 2020-02-12 07:00:03.000 | LINA     | ORDER1       |  1 |      0 |
| 2020-02-12 07:00:03.000 | LINA     | ITEM1        |  2 |      0 |
| 2020-02-12 07:09:09.000 | LINA     | FINISH BUILD |  3 |      0 |
| 2020-02-12 07:09:10.000 | LINA     | ORDER1       |  4 |      0 |
| 2020-02-12 07:09:11.000 | LINA     | ITEM2        |  5 |      0 |
| 2020-02-12 07:24:07.000 | LINA     | FINISH BUILD |  6 |      0 |
| 2020-02-12 07:24:08.000 | NAGA     | ORDER2       |  7 |      0 |
| 2020-02-12 07:24:10.000 | NAGA     | ITEM3        |  8 |      0 |
| 2020-02-12 07:45:06.000 | NAGA     | FINISH BUILD |  9 |      0 |
| 2020-02-12 07:45:12.000 | NAGA     | FINISH BUILD | 10 |      1 |
| 2020-02-12 07:45:13.000 | XELLOS   | ORDER3       | 11 |      0 |
| 2020-02-12 07:45:14.000 | XELLOS   | ITEM4        | 12 |      0 |
| 2020-02-12 07:56:36.000 | XELLOS   | FINISH BUILD | 13 |      0 |
| 2020-02-12 07:56:39.000 | GOURRY   | ORDER4       | 14 |      0 |
| 2020-02-12 07:56:40.000 | GOURRY   | ITEM5        | 15 |      0 |
| 2020-02-12 08:30:11.000 | GOURRY   | FINISH BUILD | 17 |      0 |
+-------------------------+----------+--------------+----+--------+

我想要做的是创建一个额外的列作为迭代器,将这些行中的每一行分成三组,如下所示:

+-------------------------+----------+--------------+-------+--------+-------+
|        TIMESTAMP        | USERNAME |    VALUE     | IDCol | IsDupe | SetID |
+-------------------------+----------+--------------+-------+--------+-------+
| 2020-02-12 07:00:03.000 | LINA     | ORDER1       |     1 |      0 | 1     |
| 2020-02-12 07:00:03.000 | LINA     | ITEM1        |     2 |      0 | 1     |
| 2020-02-12 07:09:09.000 | LINA     | FINISH BUILD |     3 |      0 | 1     |
| 2020-02-12 07:09:10.000 | LINA     | ORDER1       |     4 |      0 | 2     |
| 2020-02-12 07:09:11.000 | LINA     | ITEM2        |     5 |      0 | 2     |
| 2020-02-12 07:24:07.000 | LINA     | FINISH BUILD |     6 |      0 | 2     |
| 2020-02-12 07:24:08.000 | NAGA     | ORDER2       |     7 |      0 | 3     |
| 2020-02-12 07:24:10.000 | NAGA     | ITEM3        |     8 |      0 | 3     |
| 2020-02-12 07:45:06.000 | NAGA     | FINISH BUILD |     9 |      0 | 3     |
| 2020-02-12 07:45:12.000 | NAGA     | FINISH BUILD |    10 |      1 | NULL  |
| 2020-02-12 07:45:13.000 | XELLOS   | ORDER3       |    11 |      0 | 4     |
| 2020-02-12 07:45:14.000 | XELLOS   | ITEM4        |    12 |      0 | 4     |
| 2020-02-12 07:56:36.000 | XELLOS   | FINISH BUILD |    13 |      0 | 4     |
| 2020-02-12 07:56:39.000 | GOURRY   | ORDER4       |    14 |      0 | 5     |
| 2020-02-12 07:56:40.000 | GOURRY   | ITEM5        |    15 |      0 | 5     |
| 2020-02-12 08:30:11.000 | GOURRY   | FINISH BUILD |    17 |      0 | 5     |
+-------------------------+----------+--------------+-------+--------+-------+

我曾尝试在 SQL 中查找迭代语句,但对性能有很大的担忧,因为这将是一个相对较大的数据集,并且该语句需要 运行白天,影响生产。

另请注意,数据集中可能包含重复项或其他错误。此语句必须忽略 IsDupe 设置为 1 的行。

我一直在尝试构建游标来执行此操作,但 运行 遇到了许多语法问题以及一般缺乏编写游标的经验:

DECLARE @MyCursor CURSOR;
DECLARE @SetID INT;
DECLARE @OUTPUTNUM TINYINT;
DECLARE @COUNTER TINYINT;
BEGIN
    SET @MyCursor = CURSOR LOCAL FAST_FORWARD FOR
    SELECT IsDupe from dbo.MyDataTable
        WHERE IsDupe != 1
    OPEN @MyCursor 
    FETCH NEXT FROM @MyCursor INTO @SetID

    WHILE @@FETCH_STATUS = 0 BEGIN
      SET @COUNTER = 0;
      SET @OUTPUTNUM = 1;
      WHILE @COUNTER < 3
        BEGIN 
            UPDATE dbo.MyDataTable SET dbo.MyDataTable.SetID = @OUTPUTNUM
            SET @COUNTER = @COUNTER + 1
        END 
        SET @COUNTER = 0;
        SET @OUTPUTNUM =  @OUTPUTNUM + 1
      FETCH NEXT FROM @MyCursor 
      INTO @SetID 
    END; 

    CLOSE @MyCursor ;
    DEALLOCATE @MyCursor;
END;

当我 运行 执行此操作时,我收到以下消息:

[2:07:54 PM]    Started executing query at Line 1

Commands completed successfully. 

Total execution time: 00:00:00.026

但是没有结果,SetID列的值仍然全部为null。

您可以使用 windows 函数在没有光标的情况下完成此操作:

select [TIMESTAMP], USERNAME, VALUE, ID, IsDupe,  
case
when IsDupe = 1 then null
     else DENSE_RANK()over(order by GroupID)
end as SetID 
from(
    select 
    *, 
    case when value like 'ORDER%' then ID
         when value like 'ITEM%' then lag(ID,1)over (order by ID)  
         when value like 'FINISH BUILD%' then lag(ID,2)over (order by ID)
    end as GroupID
    from #tmp where IsDupe = 0
)a 

    union
    select 
    [TIMESTAMP], USERNAME, VALUE, ID, IsDupe, null as SetID   
    from #tmp where IsDupe = 1
order by ID

这是我的完整示例:

drop table #tmp

select '2020-02-12 07:00:03.000' as TIMESTAMP, 'LINA'  as USERNAME   , 'ORDER1'  as VALUE     ,  1 as ID ,      0 as IsDupe   into #tmp
union select  '2020-02-12 07:00:03.000' , 'LINA'     , 'ITEM1'        ,  2 ,      0 
union select  '2020-02-12 07:09:09.000' , 'LINA'     , 'FINISH BUILD' ,  3 ,      0 
union select  '2020-02-12 07:09:10.000' , 'LINA'     , 'ORDER1'       ,  4 ,      0 
union select  '2020-02-12 07:09:11.000' , 'LINA'     , 'ITEM2'        ,  5 ,      0 
union select  '2020-02-12 07:24:07.000' , 'LINA'     , 'FINISH BUILD' ,  6 ,      0 
union select  '2020-02-12 07:24:08.000' , 'NAGA'     , 'ORDER2'       ,  7 ,      0 
union select  '2020-02-12 07:24:10.000' , 'NAGA'     , 'ITEM3'        ,  8 ,      0 
union select  '2020-02-12 07:45:06.000' , 'NAGA'     , 'FINISH BUILD' ,  9 ,      0 
union select  '2020-02-12 07:45:12.000' , 'NAGA'     , 'FINISH BUILD' , 10 ,      1 
union select  '2020-02-12 07:45:13.000' , 'XELLOS'   , 'ORDER3'       , 11 ,      0 
union select  '2020-02-12 07:45:14.000' , 'XELLOS'   , 'ITEM4'        , 12 ,      0 
union select  '2020-02-12 07:56:36.000' , 'XELLOS'   , 'FINISH BUILD' , 13 ,      0 
union select  '2020-02-12 07:56:39.000' , 'GOURRY'   , 'ORDER4'       , 14 ,      0 
union select  '2020-02-12 07:56:40.000' , 'GOURRY'  , 'ITEM5'        , 15 ,      0 
union select  '2020-02-12 08:30:11.000' , 'GOURRY'   , 'FINISH BUILD' , 17 ,      0  
order by ID

select [TIMESTAMP], USERNAME, VALUE, ID, IsDupe,  
case
when IsDupe = 1 then null
     else DENSE_RANK()over(order by GroupID)
end as SetID 
from(
    select 
    *, 
    case when value like 'ORDER%' then ID
         when value like 'ITEM%' then lag(ID,1)over (order by ID)  
         when value like 'FINISH BUILD%' then lag(ID,2)over (order by ID)
    end as GroupID
    from #tmp where IsDupe = 0
)a 

    union
    select 
    [TIMESTAMP], USERNAME, VALUE, ID, IsDupe, null as SetID   
    from #tmp where IsDupe = 1
order by ID

输出:

        TIMESTAMP         USERNAME     VALUE      IDCol  IsDupe  SetID 
 2020-02-12 07:00:03.000  LINA      ORDER1            1       0  1     
 2020-02-12 07:00:03.000  LINA      ITEM1             2       0  1     
 2020-02-12 07:09:09.000  LINA      FINISH BUILD      3       0  1     
 2020-02-12 07:09:10.000  LINA      ORDER1            4       0  2     
 2020-02-12 07:09:11.000  LINA      ITEM2             5       0  2     
 2020-02-12 07:24:07.000  LINA      FINISH BUILD      6       0  2     
 2020-02-12 07:24:08.000  NAGA      ORDER2            7       0  3     
 2020-02-12 07:24:10.000  NAGA      ITEM3             8       0  3     
 2020-02-12 07:45:06.000  NAGA      FINISH BUILD      9       0  3     
 2020-02-12 07:45:12.000  NAGA      FINISH BUILD     10       1  NULL  
 2020-02-12 07:45:13.000  XELLOS    ORDER3           11       0  4     
 2020-02-12 07:45:14.000  XELLOS    ITEM4            12       0  4     
 2020-02-12 07:56:36.000  XELLOS    FINISH BUILD     13       0  4     
 2020-02-12 07:56:39.000  GOURRY    ORDER4           14       0  5     
 2020-02-12 07:56:40.000  GOURRY    ITEM5            15       0  5     
 2020-02-12 08:30:11.000  GOURRY    FINISH BUILD     17       0  5