计算给定 ID 的连续 NULL 值

Count consecutive NULL values for given ID

我正在尝试创建一个查询,其中包含 运行 个 ISO 周计数,而该帐户在 [Volume] table 中没有条目。查询 return 只是账户的一个样本,所以我创建了几个 CTE 来限制记录的数量并加入交易量 table 以便没有交易量的周出现在结果。

为了说明我正在寻找的结果:

ISOWk | SurrID | Weekly Volume | No vol Count

201601 |    001 |      0 |       1
201601 |    002 |      5 |       0
201602 |    001 |      0 |       2
201602 |    002 |      0 |       1
201603 |    001 |     125|       0
201603 |    002 |     75 |       0
201604 |    001 |      0 |       1
201604 |    002 |     75 |       0

如您所见,SurrID 为 001 的帐户在 201601 周和 201602 周没有交易量,因此在 201602 周的 [No vol Count] 为 2。在 201603 周,有交易量,因此计数器重置为 0 并增加到 1 周 201604.

根据我所做的研究,我设法使用 ROW_NUMBER 和 window 函数获得连续计数 运行,但如果有,它不会重置卷(如我示例中的第 201603 周)。我想不通的是如何计算连续的零值并在需要时重置。

我在下面包含了我的完整查询,以便您了解全貌(请在这里指出任何特别糟糕的做法 - 我仍在寻找方法!)。在我包含第三个 CTE "NDs" 之前,一切都按预期工作。然后需要 45 分钟 return 刚好超过 2000 行,并且 return 是未显示每周交易量的行的非重置计数。

WITH Surrs AS
    (
        SELECT SurrID, OracleStartDate AS OSD
        FROM (
                SELECT ca.SurrID, OracleStartDate, ROW_NUMBER() OVER(ORDER BY OracleStartDate) as rn
                FROM tblCustomerAccounts ca
                JOIN tblAccountUpdates au
                    ON ca.SurrID = au.SurrID
                WHERE CustomerType_ID IN (1,2,3,4,5,6,7,12)
                    AND au.ISOWk = 201641
            ) a
        WHERE rn % 1000 = 0
    ),
    Updates AS
    (
        SELECT au.ISOWk, s.SurrID, (CASE WHEN AccStatus_ID = 1 THEN 'A' ELSE 'I' END) AS AccStatus, (CASE WHEN dbo.udf_ConvertDateToISOWeek(OSD) <= BBC THEN 'B' ELSE 'F' END) AS Book
        FROM Surrs s            
        JOIN (
                SELECT  ISOWk,
                        (SELECT BBCutOff FROM dbo.udf_CutOffWeeks(ISOWk)) AS BBC,
                        (SELECT FYStart FROM dbo.udf_CutOffWeeks(ISOWk)) AS FYS,
                        (SELECT FYEnd FROM dbo.udf_CutOffWeeks(ISOWk)) AS FYE,
                        (SELECT BYStart FROM dbo.udf_CutOffWeeks(ISOWk)) AS BYS,
                        (SELECT BYEnd FROM dbo.udf_CutOffWeeks(ISOWk)) AS BYE,
                        SurrID,
                        AccStatus_ID
                FROM tblAccountUpdates
            ) au
            ON au.SurrID = s.SurrID
    ), 
    NDs AS 
    (
        SELECT u.ISOWk, u.SurrID, ROW_NUMBER() OVER (PARTITION BY u.SurrID ORDER BY u.ISOWk) AS NDCount
        FROM Updates u
        LEFT JOIN tblTotalVolumes tv
            ON u.SurrID = tv.SurrID
            AND u.ISOWk = tv.ISOWk
        WHERE tv.Volume IS NULL
            AND u.ISOWk >= 201601

    )
SELECT  tw.ISOWk, 
        tw.SurrID, 
        (CASE WHEN Volume IS NULL THEN 0 ELSE Volume END) AS [Weekly Volume], 
        tw.Book, 
        tw.AccStatus,
        (CASE WHEN tw.AccStatus = 'I' AND lw.AccStatus = 'A' THEN 'Y' ELSE '' END) AS [Stopped this week],
        (CASE WHEN tw.AccStatus = 'A' AND lw.AccStatus = 'I' THEN 'Y' ELSE '' END) AS [Restarted this week],
        (CASE WHEN NDCount IS NULL THEN 0 ELSE NDCount END) AS [Consecutive ND Weeks]

FROM Updates tw
JOIN Updates lw
    ON lw.ISOWk = dbo.udf_ConvertDateToISOWeek(DATEADD("ww",-1,dbo.udf_ConvertISOWkToDate(tw.ISOWk)))
    AND tw.SurrID = lw.SurrID
LEFT JOIN tblTotalVolumes tv
    ON tw.SurrID = tv.SurrID
    AND tw.ISOWk = tv.ISOWk
LEFT JOIN NDs
    ON tw.SurrID = nds.SurrID
    AND tw.ISOWk = nds.ISOWk

ORDER BY tw.ISOWk

重申一下我的需要:[Consecutive ND Weeks] 列应该计算 [Weekly Volume] 为 0 的连续周数。非常感谢您的帮助。

谢谢

更新:

我已经尝试实现@Gordon Linoff 的 post,但是当 [Weekly Volume] 有值时,我的计数器不会重置为 0。这是我修改后的查询:

SELECT t.*, (CASE WHEN [Weekly Volume] = 0 THEN ROW_NUMBER() OVER (PARTITION BY t.SurrID, grp ORDER BY ISOWk) ELSE 0 END) AS [ND Count]

FROM (
        SELECT  tw.ISOWk, 
                s.SurrID, 
                tw.AccStatus,
                (CASE WHEN tv.Volume IS NULL THEN 0 ELSE tv.Volume END) AS [Weekly Volume],
                (CASE WHEN dbo.udf_ConvertDateToISOWeek(OSD) <= BBC THEN 'B' ELSE 'F' END) AS Book,
                (CASE WHEN tw.AccStatus = 'I' AND lw.AccStatus = 'A' THEN 'Y' ELSE '' END) AS [Stopped this week],
                (CASE WHEN tw.AccStatus = 'A' AND lw.AccStatus = 'I' THEN 'Y' ELSE '' END) AS [Restarted this week],
                SUM(CASE WHEN tv.volume > 0 THEN 1 ELSE 0 END) OVER(PARTITION BY tv.SurrID ORDER BY tv.ISOWk) AS grp

        FROM (
                SELECT SurrID, OracleStartDate AS OSD
                FROM (
                        SELECT ca.SurrID, OracleStartDate, ROW_NUMBER() OVER(ORDER BY OracleStartDate) as rn
                        FROM tblCustomerAccounts ca
                        JOIN tblAccountUpdates au
                            ON ca.SurrID = au.SurrID
                        WHERE CustomerType_ID IN (1,2,3,4,5,6,7,12)
                            AND au.ISOWk = 201641
                    ) a
                WHERE rn % 1000 = 0
            ) s         
        JOIN (
                SELECT  ISOWk,
                        (SELECT BBCutOff FROM dbo.udf_CutOffWeeks(ISOWk)) AS BBC,
                        (SELECT FYStart FROM dbo.udf_CutOffWeeks(ISOWk)) AS FYS,
                        (SELECT FYEnd FROM dbo.udf_CutOffWeeks(ISOWk)) AS FYE,
                        (SELECT BYStart FROM dbo.udf_CutOffWeeks(ISOWk)) AS BYS,
                        (SELECT BYEnd FROM dbo.udf_CutOffWeeks(ISOWk)) AS BYE,
                        SurrID,
                        (CASE WHEN AccStatus_ID = 1 THEN 'A' ELSE 'I' END) AS AccStatus
                FROM tblAccountUpdates
            ) tw
            ON tw.SurrID = s.SurrID
        JOIN (
                SELECT  ISOWk,
                        SurrID,
                        (CASE WHEN AccStatus_ID = 1 THEN 'A' ELSE 'I' END) AS AccStatus
                FROM tblAccountUpdates
            ) lw
            ON tw.SurrID = lw.SurrID
            AND dbo.udf_ConvertDateToISOWeek(DATEADD("ww",-1,dbo.udf_ConvertISOWkToDate(tw.ISOWk))) = lw.ISOWk
        LEFT JOIN tblTotalVolumes tv
            ON tw.ISOWk = tv.ISOWk
            AND tw.SurrID = tv.SurrID
    ) t

ORDER BY ISOWk

更新:

我现在修改了我的查询以反映 Vladimir 的解决方案(再一次,这是完整的查询):

SELECT  ISOWk, 
        SurrID, 
        AccStatus, 
        [Weekly Volume], 
        Book, 
        [Stopped this week], 
        [Restarted this week],
        RN1,
        RN2,
        grp,
        rn3,
        (CASE WHEN [Weekly Volume] = 0 THEN rn3 ELSE 0 END) AS [ND Count]
FROM (
        SELECT  t.ISOWk,
                t.SurrID,
                t.AccStatus,
                t.[Weekly Volume],
                t.Book,
                t.[Stopped this week],
                t.[Restarted this week],
                rn1,
                rn2,
                rn1 - rn2 AS grp,
                ROW_NUMBER() OVER(PARTITION BY t.SurrID, rn1-rn2 ORDER BY ISOWk) AS rn3

        FROM (
                SELECT  tw.ISOWk, 
                        s.SurrID, 
                        tw.AccStatus,
                        (CASE WHEN tv.Volume IS NULL THEN 0 ELSE tv.Volume END) AS [Weekly Volume],
                        (CASE WHEN dbo.udf_ConvertDateToISOWeek(OSD) <= BBC THEN 'B' ELSE 'F' END) AS Book,
                        (CASE WHEN tw.AccStatus = 'I' AND lw.AccStatus = 'A' THEN 'Y' ELSE '' END) AS [Stopped this week],
                        (CASE WHEN tw.AccStatus = 'A' AND lw.AccStatus = 'I' THEN 'Y' ELSE '' END) AS [Restarted this week],
                        ROW_NUMBER() OVER(PARTITION BY tw.SurrID ORDER BY tw.ISOWk) AS rn1,
                        ROW_NUMBER() OVER(PARTITION BY tw.SurrID, tv.Volume ORDER BY tw.ISOWk) AS rn2

                FROM (
                        SELECT SurrID, OracleStartDate AS OSD
                        FROM (
                                SELECT ca.SurrID, OracleStartDate, ROW_NUMBER() OVER(ORDER BY OracleStartDate) as rn
                                FROM tblCustomerAccounts ca
                                JOIN tblAccountUpdates au
                                    ON ca.SurrID = au.SurrID
                                WHERE CustomerType_ID IN (1,2,3,4,5,6,7,12)
                                    AND au.ISOWk = 201641
                            ) a
                        WHERE rn % 2000 = 0
                    ) s         
                JOIN (
                        SELECT  ISOWk,
                                (SELECT BBCutOff FROM dbo.udf_CutOffWeeks(ISOWk)) AS BBC,
                                (SELECT FYStart FROM dbo.udf_CutOffWeeks(ISOWk)) AS FYS,
                                (SELECT FYEnd FROM dbo.udf_CutOffWeeks(ISOWk)) AS FYE,
                                (SELECT BYStart FROM dbo.udf_CutOffWeeks(ISOWk)) AS BYS,
                                (SELECT BYEnd FROM dbo.udf_CutOffWeeks(ISOWk)) AS BYE,
                                SurrID,
                                (CASE WHEN AccStatus_ID = 1 THEN 'A' ELSE 'I' END) AS AccStatus
                        FROM tblAccountUpdates
                    ) tw
                    ON tw.SurrID = s.SurrID
                JOIN (
                        SELECT  ISOWk,
                                SurrID,
                                (CASE WHEN AccStatus_ID = 1 THEN 'A' ELSE 'I' END) AS AccStatus
                        FROM tblAccountUpdates
                    ) lw
                    ON tw.SurrID = lw.SurrID
                    AND dbo.udf_ConvertDateToISOWeek(DATEADD("ww",-1,dbo.udf_ConvertISOWkToDate(tw.ISOWk))) = lw.ISOWk
                LEFT JOIN tblTotalVolumes tv
                    ON tw.ISOWk = tv.ISOWk
                    AND tw.SurrID = tv.SurrID

            ) t
    ) x

ORDER BY ISOWk

这是意外结果的示例(当 [Weekly Volume] 的值大于 0 时,计数器不会重置为 0,用星号标识)。以下都具有相同的 ID,因此我删除了 ID 列。

ISOWk  | Weekly Volume | rn1 | rn2 | grp | rn3 | ND Count |
201620 |            0  |  1  |  1  |  0  |  1  |    1     |
201621 |            0  |  2  |  2  |  0  |  2  |    2     |
201622 |            0  |  3  |  3  |  0  |  3  |    3     |
201623 |            0  |  4  |  4  |  0  |  4  |    4     |
201624 |            0  |  5  |  5  |  0  |  5  |    5     |
201625 |           53  |  6  |  1  |  5  |  1  |    0     |
201626 |           49  |  7  |  1  |  6  |  1  |    0     |
201627 |           98  |  8  |  1  |  7  |  1  |    0     |
201628 |           54  |  9  |  1  |  8  |  1  |    0     |
201629 |           53  | 10  |  2  |  8  |  2  |    0     |
201630 |          103  | 11  |  1  | 10  |  1  |    0     |
201631 |           59  | 12  |  1  | 11  |  1  |    0     |
201632 |           35  | 13  |  1  | 12  |  1  |    0     |
201633 |            0  | 14  |  6  |  8  |  3  |    3     |**
201634 |            0  | 15  |  7  |  8  |  4  |    4     |**
201635 |            0  | 16  |  8  |  8  |  5  |    5     |**
201636 |            0  | 17  |  9  |  8  |  6  |    6     |**
201637 |           87  | 18  |  1  | 17  |  1  |    0     |
201638 |          136  | 19  |  1  | 18  |  1  |    0     |
201639 |           56  | 20  |  1  | 19  |  1  |    0     |
201640 |           70  | 21  |  1  | 20  |  0  |    0     |
201641 |           77  | 22  |  1  | 21  |  1  |    0     |

我的数据集中还有此问题的其他实例。

这里有一个方法:

  1. 计算 SurrId 前面的非 0 值的数量。此数字标识组。
  2. 在每组中做一个row_number()
  3. 当值为0时才考虑row_number()

这导致:

select t.*,
       (case when weeklyvolume = 0
             then row_number() over (partition by SurrId, grp order by ISOwk)
             else 0
        end) as NoVolCount
from (select t.*,
             sum(case when weeklyvolume > 0 then 1 else 0 end) over (partition by SurrId order by ISOwk) as grp
      from t
     ) t;

看起来像 gaps-and-islands 问题。

示例数据

DECLARE @T TABLE(ISOWk int, SurrID char(3), WeeklyVolume int);
INSERT INTO @T(ISOWk, SurrID, WeeklyVolume) VALUES
(201601, '001',  0),
(201601, '002',  5),
(201602, '001',  0),
(201602, '002',  0),
(201603, '001',125),
(201603, '002', 75),
(201604, '001',  0),
(201604, '002', 75),
(201620, '003',  0),
(201621, '003',  0),
(201622, '003',  0),
(201623, '003',  0),
(201624, '003',  0),
(201625, '003', 53),
(201626, '003', 49),
(201627, '003', 98),
(201628, '003', 54),
(201629, '003', 53),
(201630, '003',103),
(201631, '003', 59),
(201632, '003', 35),
(201633, '003',  0),
(201634, '003',  0),
(201635, '003',  0),
(201636, '003',  0),
(201637, '003', 87),
(201638, '003',136),
(201639, '003', 56),
(201640, '003', 70),
(201641, '003', 77),
(201601, '004',  0),
(201602, '004',  6),
(201603, '004',  0),
(201604, '004',  0);

我用 SurrID=003 添加了你的扩展样本,用 SurrID=004 添加了我的样本。

查询

WITH
CTE
AS
(
    SELECT
        ISOWk
        ,SurrID
        ,WeeklyVolume
        ,ROW_NUMBER() OVER (PARTITION BY SurrID ORDER BY ISOWk) AS rn1
        ,ROW_NUMBER() OVER (PARTITION BY SurrID,WeeklyVolume ORDER BY ISOWk) AS rn2
    FROM @T
)
,CTE2
AS
(
    SELECT
        ISOWk
        ,SurrID
        ,WeeklyVolume
        ,rn1
        ,rn2
        ,rn1-rn2 AS grp
        ,ROW_NUMBER() OVER (PARTITION BY SurrID,WeeklyVolume,rn1-rn2 ORDER BY ISOWk) AS rn3
    FROM CTE
)
SELECT
    ISOWk
    ,SurrID
    ,WeeklyVolume
    ,rn1
    ,rn2
    ,grp
    ,rn3
    ,CASE WHEN WeeklyVolume = 0 THEN rn3 ELSE 0 END AS NoVolumeCount
FROM CTE2
ORDER BY SurrID, ISOWk;

结果

+--------+--------+--------------+-----+-----+-----+-----+---------------+
| ISOWk  | SurrID | WeeklyVolume | rn1 | rn2 | grp | rn3 | NoVolumeCount |
+--------+--------+--------------+-----+-----+-----+-----+---------------+
| 201601 |    001 |            0 |   1 |   1 |   0 |   1 |             1 |
| 201602 |    001 |            0 |   2 |   2 |   0 |   2 |             2 |
| 201603 |    001 |          125 |   3 |   1 |   2 |   1 |             0 |
| 201604 |    001 |            0 |   4 |   3 |   1 |   1 |             1 |
| 201601 |    002 |            5 |   1 |   1 |   0 |   1 |             0 |
| 201602 |    002 |            0 |   2 |   1 |   1 |   1 |             1 |
| 201603 |    002 |           75 |   3 |   1 |   2 |   1 |             0 |
| 201604 |    002 |           75 |   4 |   2 |   2 |   2 |             0 |
| 201620 |    003 |            0 |   1 |   1 |   0 |   1 |             1 |
| 201621 |    003 |            0 |   2 |   2 |   0 |   2 |             2 |
| 201622 |    003 |            0 |   3 |   3 |   0 |   3 |             3 |
| 201623 |    003 |            0 |   4 |   4 |   0 |   4 |             4 |
| 201624 |    003 |            0 |   5 |   5 |   0 |   5 |             5 |
| 201625 |    003 |           53 |   6 |   1 |   5 |   1 |             0 |
| 201626 |    003 |           49 |   7 |   1 |   6 |   1 |             0 |
| 201627 |    003 |           98 |   8 |   1 |   7 |   1 |             0 |
| 201628 |    003 |           54 |   9 |   1 |   8 |   1 |             0 |
| 201629 |    003 |           53 |  10 |   2 |   8 |   1 |             0 |
| 201630 |    003 |          103 |  11 |   1 |  10 |   1 |             0 |
| 201631 |    003 |           59 |  12 |   1 |  11 |   1 |             0 |
| 201632 |    003 |           35 |  13 |   1 |  12 |   1 |             0 |
| 201633 |    003 |            0 |  14 |   6 |   8 |   1 |             1 |
| 201634 |    003 |            0 |  15 |   7 |   8 |   2 |             2 |
| 201635 |    003 |            0 |  16 |   8 |   8 |   3 |             3 |
| 201636 |    003 |            0 |  17 |   9 |   8 |   4 |             4 |
| 201637 |    003 |           87 |  18 |   1 |  17 |   1 |             0 |
| 201638 |    003 |          136 |  19 |   1 |  18 |   1 |             0 |
| 201639 |    003 |           56 |  20 |   1 |  19 |   1 |             0 |
| 201640 |    003 |           70 |  21 |   1 |  20 |   1 |             0 |
| 201641 |    003 |           77 |  22 |   1 |  21 |   1 |             0 |
| 201601 |    004 |            0 |   1 |   1 |   0 |   1 |             1 |
| 201602 |    004 |            6 |   2 |   1 |   1 |   1 |             0 |
| 201603 |    004 |            0 |   3 |   2 |   1 |   1 |             1 |
| 201604 |    004 |            0 |   4 |   3 |   1 |   2 |             2 |
+--------+--------+--------------+-----+-----+-----+-----+---------------+

我在结果中包含了中间列,因此您可以了解它是如何工作的。

gaps-and-islands 的标准方法是 ROW_NUMBER 的两个序列 - 一个是普通序列 (rn1),第二个被 WeeklyVolume (rn2).

rn1rn2之间的区别给出了组(岛)的ID(grp)。计算由 WeeklyVolume 和该组 (rn3) 划分的另一个行号序列,并仅在 WeeklyVolume 为零时使用它。

显然,上面的所有内容首先被 SurrID 分割。

在答案的第一个变体中,我忘记在 rn3 的最后一个分区中包含 WeeklyVolume