在值随 group by 更改之前找到的行数

Found number of rows before a value changes with a group by

我有一个table像这个

CREATE TABLE Levels
    ([userid] int, [counter1] int, [counter2] int, [date] datetime)
;

counter2是增量值。 date 只是创建行的日期时间。 counter1 是一个可以取不同整数值的字段。以及 userid 用户的 id。

这是一个数据示例。您可以在 sqlfiddle

中找到包含两个用户的更大示例
| userid | counter1 | counter2 |                 date |
|--------|----------|----------|----------------------|
|    123 |        6 |       42 | 2010-07-31T00:12:28Z |
|    123 |        6 |       43 | 2010-11-20T00:11:15Z |
|    123 |        6 |       44 | 2011-03-12T00:15:07Z |
|    123 |        5 |       45 | 2011-07-02T01:11:09Z |
|    123 |        5 |       46 | 2011-10-22T00:24:18Z |
|    123 |        5 |       47 | 2012-02-10T23:51:54Z |
|    123 |        5 |       48 | 2012-06-01T23:43:26Z |
|    123 |        5 |       49 | 2012-09-21T23:43:59Z |
|    123 |        4 |       50 | 2013-01-11T23:52:43Z |
|    123 |        4 |       51 | 2013-05-03T23:49:25Z |
|    123 |        4 |       52 | 2013-08-23T23:48:24Z |
|    123 |        3 |       53 | 2013-12-14T00:01:20Z |
|    123 |        3 |       54 | 2014-04-04T23:45:45Z |
|    123 |        4 |       55 | 2014-07-25T23:44:34Z |
|    123 |        5 |       56 | 2014-11-14T23:46:11Z |

我尝试做的是计算 counter1 在更改之前具有相同值的次数。为什么我在 Whosebug 中找到的其他问题都不起作用?

sqlfiddle 中完整示例的预期结果是

| userid | counter1 | count |
|--------|----------|-------|
|     123|         6|      3|
|     123|         5|      5|
|     123|         4|      3|
|     123|         3|      2|
|     123|         4|      1|
|     123|         5|      1|
|     123|         6|      2|
|     123|         5|      5|
|     123|         4|      2|
|     123|         5|      1|
|     123|         4|      5|
|     123|         5|      5|
|     345|         6|      2|
|     345|         6|      9|

这是一种间隙和孤岛问题。幸运的是,你可以使用行号的差异:

select userid, counter1, count(*)
from (select t.*,
             row_number() over (partition by userid order by counter2) as seqnum,
             row_number() over (partition by userid, counter1 order by counter2) as seqnum_2
      from t
     ) t
group by userid, counter1, (seqnum - seqnum_2)
order by userid, min(counter2);

注意:这假设排序是基于 counter2。如果它确实基于 date,那么您可以改用该列。

为什么这行得通有点难以解释。但是,如果您查看子查询的结果,您会发现当 counter1 在相邻行上具有相同值时,两个 row_number() 值之间的差异是如何保持不变的。

你在这里实际上并不需要 LEADLAG,但是,获得受支持的 SQL 服务器版本,其中 LAG(和 LEAD) 可用应优先考虑。

WITH YourTable AS(
    SELECT *
    FROM (VALUES(123,6,42,CONVERT(datetime2(0),'2010-07-31T00:12:28Z')),
                (123,6,43,CONVERT(datetime2(0),'2010-11-20T00:11:15Z')),
                (123,6,44,CONVERT(datetime2(0),'2011-03-12T00:15:07Z')),
                (123,5,45,CONVERT(datetime2(0),'2011-07-02T01:11:09Z')),
                (123,5,46,CONVERT(datetime2(0),'2011-10-22T00:24:18Z')),
                (123,5,47,CONVERT(datetime2(0),'2012-02-10T23:51:54Z')),
                (123,5,48,CONVERT(datetime2(0),'2012-06-01T23:43:26Z')),
                (123,5,49,CONVERT(datetime2(0),'2012-09-21T23:43:59Z')),
                (123,4,50,CONVERT(datetime2(0),'2013-01-11T23:52:43Z')),
                (123,4,51,CONVERT(datetime2(0),'2013-05-03T23:49:25Z')),
                (123,4,52,CONVERT(datetime2(0),'2013-08-23T23:48:24Z')),
                (123,3,53,CONVERT(datetime2(0),'2013-12-14T00:01:20Z')),
                (123,3,54,CONVERT(datetime2(0),'2014-04-04T23:45:45Z')),
                (123,4,55,CONVERT(datetime2(0),'2014-07-25T23:44:34Z')),
                (123,5,56,CONVERT(datetime2(0),'2014-11-14T23:46:11Z')))V(userid,counter1,counter2,date)),
Grps AS (
    SELECT userid,
           counter1,
           counter2,
           date,
           ROW_NUMBER() OVER (PARTITION BY userid ORDER BY [date]) - 
           ROW_NUMBER() OVER (PARTITION BY userid,counter1 ORDER BY [date]) AS Grp
    FROM YourTable)
SELECT userid,
       counter1,
       COUNT(*)
FROM Grps
GROUP BY userid,
         counter1,
         Grp;