SQL 查询根据不同列的先前值对行进行计数

SQL query to count rows based on previous values of different column

我在 SAS 工作,我有一个 table 看起来像这样

ID | Time | Main | lag_1 | lag_2
----------------------------------------------------------------------------
A  |  01  |   0  |   0   |  1  
A  |  03  |   0  |   0   |  1  
A  |  04  |   0  |   0   |  0  
A  |  10  |   1  |   0   |  0  
A  |  11  |   1  |   0   |  0  
A  |  12  |   1  |   0   |  0  
B  |  02  |   1  |   1   |  1  
B  |  04  |   0  |   1   |  1  
B  |  07  |   0  |   0   |  1  
B  |  10  |   1  |   0   |  0  
B  |  11  |   1  |   0   |  0  
B  |  12  |   1  |   0   |  0  

除非有多个 ID。 table 按 ID 和时间排序。在计算主列中的总计数后(称之为 tot),我试图计算 2 个东西:

  1. 仅当lag_1在某个时间 Main变为1之前等于1时,Main列中的总计数,比如说 tot_1;和
  2. 与 1 相同。但在本例中,对于 lag_2,调用变量 tot_2

预期计算的 table 会给我

tot | tot_1 | tot_2
--------------------
 7  |   3   |   6

因为 tot_1 应该是 3(0 来自 ID = A + 3 来自 ID = B),并且 tot_2 应该是 6(3 来自 ID = A + 3 来自 ID = B)。

我是这些类型细分的完全初学者,因此非常感谢您的帮助。

编辑:我希望 tot_2 >= tot_1 因为 lag_2 是建立在 Main 事件的基础上的,它比 lag_1 回溯的时间更长。

如果我没理解错的话,你想要每个 id 的这些总和。关键是比较不同情况下id的最小值,然后求和。这是所有条件聚合:

select sum(tot) as tot,
       sum(case when id_lag_1 < id_main then tot else 0 end) as tot_1,
       sum(case when id_lag_2 < id_main then tot else 0 end) as tot_2
from (select id, sum(main) as tot,
             min(case when main = 1 then id end) as id_main,
             min(case when lag_1 = 1 then id end) as id_lag_1,
             min(case when lag_2 = 1 then id end) as id_lag_2
      from t 
      group by id
     ) t;

考虑 tot_1 和 tot_2

的计算

我的第一步是寻找 lag_1 > main 的模式(这满足你提到的情况,即在 main=1 之前的某个时间找到 lag_1=1 的记录)和我将所有这些值命名为 'grp_lag_1' 和 'grp_lag_2'

一旦我对记录进行了分组,我 "copy" 使用 max() over(order by id,time1) 降低值。

select *
      ,max(case when lag_1 > main then 'grp_lag_1' end) over(partition by id order by id,time1) as grp_1 
      ,max(case when lag_2 > main then 'grp_lag_2' end) over(partition by id order by id,time1) as grp_2 
  from t

所以我得到如下结果

+----+-------+------+-------+-------+-----------+-----------+
| id | time1 | main | lag_1 | lag_2 |   grp_1   |   grp_2   |
+----+-------+------+-------+-------+-----------+-----------+
| A  |    01 |    0 |     0 |     1 |           | grp_lag_2 |
| A  |    03 |    0 |     0 |     1 |           | grp_lag_2 |
| A  |    04 |    0 |     0 |     0 |           | grp_lag_2 |
| A  |    10 |    1 |     0 |     0 |           | grp_lag_2 |
| A  |    11 |    1 |     0 |     0 |           | grp_lag_2 |
| A  |    12 |    1 |     0 |     0 |           | grp_lag_2 |
| B  |    02 |    1 |     1 |     1 |           |           |
| B  |    04 |    0 |     1 |     1 | grp_lag_1 | grp_lag_2 |
| B  |    07 |    0 |     0 |     1 | grp_lag_1 | grp_lag_2 |
| B  |    10 |    1 |     0 |     0 | grp_lag_1 | grp_lag_2 |
| B  |    11 |    1 |     0 |     0 | grp_lag_1 | grp_lag_2 |
| B  |    12 |    1 |     0 |     0 | grp_lag_1 | grp_lag_2 |
+----+-------+------+-------+-------+-----------+-----------+

在此之后,如果我要总结 grp_lag_1 的主要值,我会得到 tot_1 并且同样总结 grp+lag_2 我会得到 tot_2

 select sum(main) as tot_cnt
       ,sum(case when grp_1='grp_lag_1' then main end) as tot_1
       ,sum(case when grp_2='grp_lag_2' then main end) as tot_2
 from(      
select *
      ,max(case when lag_1 > main then 'grp_lag_1' end) over(partition by id order by id,time1) as grp_1 
      ,max(case when lag_2 > main then 'grp_lag_2' end) over(partition by id order by id,time1) as grp_2 
  from t
  )x


+---------+-------+-------+
| tot_cnt | tot_1 | tot_2 |
+---------+-------+-------+
|       7 |     3 |     6 |
+---------+-------+-------+

演示 https://dbfiddle.uk/?rdbms=sqlserver_2012&fiddle=c17be111dbc3c516afa2bc3dcd3c9e1c

在数据步骤中更容易做到。这样你就可以检查新 id 的开始并重置 lag_x 变量是否为真的标志。

data want ;
  set have end=eof;
  by id time ;
  tot + main ;
  if first.id then call missing(any_lag_1,any_lag_2);
  if any_lag_1 then tot_1 + main ;
  if any_lag_2 then tot_2 + main ;
  if eof then output;
  any_lag_1+lag_1;
  any_lag_2+lag_2;
  keep tot: ;
run;