计算 SQL 中值转换之间的行数
Counting the number of rows between transitions of values in SQL
我有包含 user_id、时间戳和是/否答案的行。我想计算每个 ID 有多少条“NO”的条纹(连续行)。
示例:
user_id
timestamp
response
no_streak
1
2021-01-20 13:59:26
YES
0
1
2021-01-20 14:01:27
NO
1
1
2021-01-20 14:03:21
NO
2
1
2021-01-20 14:07:29
NO
3
1
2021-01-20 14:09:22
YES
0
1
2021-01-20 14:11:26
YES
0
1
2021-01-20 14:13:30
NO
1
1
2021-01-20 14:17:26
NO
2
1
2021-01-20 14:19:29
YES
0
1
2021-01-20 14:25:30
NO
1
1
2021-01-20 14:27:23
NO
2
1
2021-01-20 14:31:23
NO
3
1
2021-01-20 14:35:27
NO
4
1
2021-01-20 14:39:24
YES
0
2
2021-01-20 14:39:24
NO
1
2
2021-01-20 14:47:28
NO
2
2
2021-01-20 14:49:22
NO
3
2
2021-01-20 14:51:25
NO
4
2
2021-01-20 14:53:29
NO
5
2
2021-01-20 14:55:22
NO
6
2
2021-01-20 14:57:22
YES
0
最终我想知道每个用户的连续多长时间:
user_id
streak length
1
0
1
3
1
2
1
4
2
0
2
6
我可以使用 LAG()
找到“否”到“是”的过渡位置,反之亦然,但我很难计算每个过渡之间的行数.
计算每行“是”的数量,使相邻的“否”具有相同的分组值。然后过滤聚合:
select t.user_id, count(*), min(timestamp), max(timestamp)
from (select t.*,
sum(case when response = 'YES' then 1 else 0 end) over (partition by user_id order by timestamp) as grp
from t
) t
where response = 'NO'
group by user_id, grp;
注意:这不是 return 长度 0
的条纹。我不确定“连胜”这个词是否合适。但是要获取它们,请删除 where
过滤器并使用条件聚合:
select t.user_id, sum(case when response = 'NO' then 1 else 0 end),
min(timestamp), max(timestamp)
from (select t.*,
sum(case when response = 'YES' then 1 else 0 end) over (partition by user_id order by timestamp) as grp
from t
) t
group by user_id, grp;
我有包含 user_id、时间戳和是/否答案的行。我想计算每个 ID 有多少条“NO”的条纹(连续行)。
示例:
user_id | timestamp | response | no_streak |
---|---|---|---|
1 | 2021-01-20 13:59:26 | YES | 0 |
1 | 2021-01-20 14:01:27 | NO | 1 |
1 | 2021-01-20 14:03:21 | NO | 2 |
1 | 2021-01-20 14:07:29 | NO | 3 |
1 | 2021-01-20 14:09:22 | YES | 0 |
1 | 2021-01-20 14:11:26 | YES | 0 |
1 | 2021-01-20 14:13:30 | NO | 1 |
1 | 2021-01-20 14:17:26 | NO | 2 |
1 | 2021-01-20 14:19:29 | YES | 0 |
1 | 2021-01-20 14:25:30 | NO | 1 |
1 | 2021-01-20 14:27:23 | NO | 2 |
1 | 2021-01-20 14:31:23 | NO | 3 |
1 | 2021-01-20 14:35:27 | NO | 4 |
1 | 2021-01-20 14:39:24 | YES | 0 |
2 | 2021-01-20 14:39:24 | NO | 1 |
2 | 2021-01-20 14:47:28 | NO | 2 |
2 | 2021-01-20 14:49:22 | NO | 3 |
2 | 2021-01-20 14:51:25 | NO | 4 |
2 | 2021-01-20 14:53:29 | NO | 5 |
2 | 2021-01-20 14:55:22 | NO | 6 |
2 | 2021-01-20 14:57:22 | YES | 0 |
最终我想知道每个用户的连续多长时间:
user_id | streak length |
---|---|
1 | 0 |
1 | 3 |
1 | 2 |
1 | 4 |
2 | 0 |
2 | 6 |
我可以使用 LAG()
找到“否”到“是”的过渡位置,反之亦然,但我很难计算每个过渡之间的行数.
计算每行“是”的数量,使相邻的“否”具有相同的分组值。然后过滤聚合:
select t.user_id, count(*), min(timestamp), max(timestamp)
from (select t.*,
sum(case when response = 'YES' then 1 else 0 end) over (partition by user_id order by timestamp) as grp
from t
) t
where response = 'NO'
group by user_id, grp;
注意:这不是 return 长度 0
的条纹。我不确定“连胜”这个词是否合适。但是要获取它们,请删除 where
过滤器并使用条件聚合:
select t.user_id, sum(case when response = 'NO' then 1 else 0 end),
min(timestamp), max(timestamp)
from (select t.*,
sum(case when response = 'YES' then 1 else 0 end) over (partition by user_id order by timestamp) as grp
from t
) t
group by user_id, grp;