如何使用 SQL 计算列中非连续值的个数?
How to count number of non-consecutive values in a column using SQL?
跟进我的问题 。假设我在 Oracle 数据库中有一个 table,如下所示 (table_1),它跟踪特定个人的服务参与情况:
name day srvc_ inv
bill 1 1
bill 2 1
bill 3 0
bill 4 0
bill 5 1
bill 6 0
susy 1 1
susy 2 0
susy 3 1
susy 4 0
susy 5 1
我的目标是获得一份摘要table,其中列出了所有唯一个人是否涉及服务以及不同服务事件的数量(在本例中为 2 个 bill,3 个用于 susy),几天内 activity 中的中断标识了一个明显的服务事件。
要获得 任何 服务参与,我将使用以下查询
SELECT table_1."Name", MAX(table_1."Name") AS "any_invl"
FROM table_1
GROUP BY table_1."Name"
但是,我不知道如何获得服务参与的数量 (2)。在 R 中使用静态数据帧,您将使用 运行 长度编码(请参阅我的原始问题),但我不知道如何在 SQL 中完成此操作。此操作将 运行 处理大量记录,因此将整个数据帧存储为一个对象然后 运行 它在 R 中是不切实际的。
编辑: 我的预期输出如下:
name any_invl n_srvc_inv
bill 1 2
susy 1 3
感谢您的帮助!
是这样的吗?
SQL> with test (name, day, srvc_inv) as
2 (select 'bill', 1, 1 from dual union all
3 select 'bill', 2, 1 from dual union all
4 select 'bill', 3, 0 from dual union all
5 select 'bill', 4, 0 from dual union all
6 select 'bill', 5, 1 from dual union all
7 select 'bill', 6, 0 from dual union all
8 select 'susy', 1, 1 from dual union all
9 select 'susy', 2, 0 from dual union all
10 select 'susy', 3, 1 from dual union all
11 select 'susy', 4, 0 from dual union all
12 select 'susy', 5, 1 from dual
13 ),
14 inter as
15 (select name, day, srvc_inv,
16 nvl(lead(srvc_inv) over (partition by name order by day), 0) lsrvc
17 from test
18 )
19 select name,
20 sum(case when srvc_inv <> lsrvc and lsrvc = 0 then 1
21 else 0
22 end) grp
23 from inter
24 group by name;
NAME GRP
---- ----------
bill 2
susy 3
SQL>
你可以试试下面的查询,有LAG函数来处理srvc_invl
中的变化
select name, 1 any_invl, count(case when diff = 1 then 1 end) n_srvc_inv
from (select name, day, srvc_inv - LAG(srvc_inv, 1, 0) OVER(ORDER BY name, day) diff
from tab
order by name, day) temp
group by name
Here就是fiddle供大家参考。
我建议使用 lag()
。这个想法是计算一个“1”,但只有当前面的值为零或 null
:
select name, count(*)
from (select t.*,
lag(srvc_inv) over (partition by name order by day) as prev_srvc_inv
from t
) t
where (prev_srvc_inv is null or prev_srvc_inv = 0) and
srvc_inv = 1
group by name;
您可以使用 lag()
的默认值稍微简化一下:
select name, count(*)
from (select t.*,
lag(srvc_inv, 1, 0) over (partition by name order by day) as prev_srvc_inv
from t
) t
where prev_srvc_inv = 0 and srvc_inv = 1
group by name;
跟进我的问题
name day srvc_ inv
bill 1 1
bill 2 1
bill 3 0
bill 4 0
bill 5 1
bill 6 0
susy 1 1
susy 2 0
susy 3 1
susy 4 0
susy 5 1
我的目标是获得一份摘要table,其中列出了所有唯一个人是否涉及服务以及不同服务事件的数量(在本例中为 2 个 bill,3 个用于 susy),几天内 activity 中的中断标识了一个明显的服务事件。
要获得 任何 服务参与,我将使用以下查询
SELECT table_1."Name", MAX(table_1."Name") AS "any_invl"
FROM table_1
GROUP BY table_1."Name"
但是,我不知道如何获得服务参与的数量 (2)。在 R 中使用静态数据帧,您将使用 运行 长度编码(请参阅我的原始问题),但我不知道如何在 SQL 中完成此操作。此操作将 运行 处理大量记录,因此将整个数据帧存储为一个对象然后 运行 它在 R 中是不切实际的。
编辑: 我的预期输出如下:
name any_invl n_srvc_inv
bill 1 2
susy 1 3
感谢您的帮助!
是这样的吗?
SQL> with test (name, day, srvc_inv) as
2 (select 'bill', 1, 1 from dual union all
3 select 'bill', 2, 1 from dual union all
4 select 'bill', 3, 0 from dual union all
5 select 'bill', 4, 0 from dual union all
6 select 'bill', 5, 1 from dual union all
7 select 'bill', 6, 0 from dual union all
8 select 'susy', 1, 1 from dual union all
9 select 'susy', 2, 0 from dual union all
10 select 'susy', 3, 1 from dual union all
11 select 'susy', 4, 0 from dual union all
12 select 'susy', 5, 1 from dual
13 ),
14 inter as
15 (select name, day, srvc_inv,
16 nvl(lead(srvc_inv) over (partition by name order by day), 0) lsrvc
17 from test
18 )
19 select name,
20 sum(case when srvc_inv <> lsrvc and lsrvc = 0 then 1
21 else 0
22 end) grp
23 from inter
24 group by name;
NAME GRP
---- ----------
bill 2
susy 3
SQL>
你可以试试下面的查询,有LAG函数来处理srvc_invl
中的变化select name, 1 any_invl, count(case when diff = 1 then 1 end) n_srvc_inv
from (select name, day, srvc_inv - LAG(srvc_inv, 1, 0) OVER(ORDER BY name, day) diff
from tab
order by name, day) temp
group by name
Here就是fiddle供大家参考。
我建议使用 lag()
。这个想法是计算一个“1”,但只有当前面的值为零或 null
:
select name, count(*)
from (select t.*,
lag(srvc_inv) over (partition by name order by day) as prev_srvc_inv
from t
) t
where (prev_srvc_inv is null or prev_srvc_inv = 0) and
srvc_inv = 1
group by name;
您可以使用 lag()
的默认值稍微简化一下:
select name, count(*)
from (select t.*,
lag(srvc_inv, 1, 0) over (partition by name order by day) as prev_srvc_inv
from t
) t
where prev_srvc_inv = 0 and srvc_inv = 1
group by name;