如何计算用户在 MYSQL 中连续多少天被标记为橙色?
How to count how many consecutive days a user is tagged as orange in MYSQL?
我想知道如何计算用户被标记为橙色的连续天数到今天为止。我有以下
CREATE TABLE `survey_daily` (
`id` int(11) NOT NULL,
`user_id` varchar(30) NOT NULL,
`color` varchar(10) NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `survey_daily` (`id`, `user_id`, `color`, `timestamp`) VALUES
(1, '3236', "ORANGE", '2020-05-12 02:40:59'),
(2, '3236', "WHITE", '2020-05-13 02:40:59'),
(3, '3236', "ORANGE", '2020-05-14 02:40:59'),
(4, '3236', "ORANGE", '2020-05-15 02:40:59'),
(5, '3237', "ORANGE", '2020-05-15 02:40:59'),
(6, '3237', "ORANGE", '2020-05-16 02:40:59'),
(7, '3236', "ORANGE", '2020-05-16 02:40:59');
Fiddle: http://sqlfiddle.com/#!9/40cb26/1.
基本上我在 table 上有多个用户,我想计算一个用户被标记为橙色的连续天数。
在我的示例中,用户 ID 3236 应连续 3 天为橙色,而用户 3237 应有 2 天标记为橙色,直到今天。如果他们中的 none 今天没有记录,它将 return 到 0.
谢谢
这是一个缺口和孤岛问题。如果你是运行 MySQL 8.0,一种做法是利用row_numbers()
之间的差异,将某用户颜色相同的连续记录构建成组,然后聚合:
select
user_id,
count(*) no_records,
min(timestamp) start_timestamp,
max(timestamp) max_timestamp
from (
select
s.*,
row_number() over(partition by user_id order by timestamp) rn1,
row_number() over(partition by user_id, color order by timestamp) rn2
from survey_daily s
) t
where color = 'orange'
group by user_id, rn1 - rn2
order by user_id, start_timestamp
这会为每个用户的每一系列相邻橙色记录生成一条记录:
user_id | no_records | start_timestamp | max_timestamp
:------ | ---------: | :------------------ | :------------------
3236 | 1 | 2020-05-12 02:40:59 | 2020-05-12 02:40:59
3236 | 3 | 2020-05-14 02:40:59 | 2020-05-16 02:40:59
3237 | 2 | 2020-05-15 02:40:59 | 2020-05-16 02:40:59
如果您只想要每个用户的最长连胜记录,您可以在此基础上使用聚合,或者再次使用 window 函数:
select *
from (
select
user_id,
count(*) no_records,
min(timestamp) start_timestamp,
max(timestamp) max_timestamp,
row_number() over(partition by user_id order by count(*) desc) rn
from (
select
s.*,
row_number() over(partition by user_id order by timestamp) rn1,
row_number() over(partition by user_id, color order by timestamp) rn2
from survey_daily s
) t
where color = 'ORANGE'
group by user_id, rn1 - rn2
) t
where rn = 1
order by user_id, start_timestamp
user_id | no_records | start_timestamp | max_timestamp | rn
:------ | ---------: | :------------------ | :------------------ | -:
3236 | 3 | 2020-05-14 02:40:59 | 2020-05-16 02:40:59 | 1
3237 | 2 | 2020-05-15 02:40:59 | 2020-05-16 02:40:59 | 1
SELECT t1.user_id, MAX(1 + DATEDIFF(t2.`timestamp`, t1.`timestamp`)) max_delta
FROM survey_daily t1
JOIN survey_daily t2 ON t1.user_id = t2.user_id
WHERE t1.color = 'ORANGE'
AND t2.color = 'ORANGE'
AND t1.`timestamp` <= t2.`timestamp`
AND NOT EXISTS ( SELECT NULL
FROM survey_daily t3
WHERE t1.user_id = t3.user_id
AND t3.color != 'ORANGE'
AND t1.`timestamp` < t3.`timestamp`
AND t3.`timestamp` < t2.`timestamp` )
GROUP BY t1.user_id;
逻辑。获取用户的所有记录对,其中两条记录的颜色均为橙色,并且 none 记录之间存在另一种颜色。计算每对中以天为单位的距离。获取最大间隙值。
fiddle (thanks to GMB 对于 fiddle 从中获取源数据脚本)。
PS。如果某些用户存在 none 橙色记录,则不会返回该用户。如果您也需要这样的用户,请获取 survey_daily
table 的副本,通过 user_id
将我的查询作为子查询左联接到它,然后从 table 获取用户并来自子查询的连续天数(用 COALESCE 函数包装它以将 NULL 值转换为零)。
我想知道如何计算用户被标记为橙色的连续天数到今天为止。我有以下
CREATE TABLE `survey_daily` (
`id` int(11) NOT NULL,
`user_id` varchar(30) NOT NULL,
`color` varchar(10) NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `survey_daily` (`id`, `user_id`, `color`, `timestamp`) VALUES
(1, '3236', "ORANGE", '2020-05-12 02:40:59'),
(2, '3236', "WHITE", '2020-05-13 02:40:59'),
(3, '3236', "ORANGE", '2020-05-14 02:40:59'),
(4, '3236', "ORANGE", '2020-05-15 02:40:59'),
(5, '3237', "ORANGE", '2020-05-15 02:40:59'),
(6, '3237', "ORANGE", '2020-05-16 02:40:59'),
(7, '3236', "ORANGE", '2020-05-16 02:40:59');
Fiddle: http://sqlfiddle.com/#!9/40cb26/1.
基本上我在 table 上有多个用户,我想计算一个用户被标记为橙色的连续天数。
在我的示例中,用户 ID 3236 应连续 3 天为橙色,而用户 3237 应有 2 天标记为橙色,直到今天。如果他们中的 none 今天没有记录,它将 return 到 0.
谢谢
这是一个缺口和孤岛问题。如果你是运行 MySQL 8.0,一种做法是利用row_numbers()
之间的差异,将某用户颜色相同的连续记录构建成组,然后聚合:
select
user_id,
count(*) no_records,
min(timestamp) start_timestamp,
max(timestamp) max_timestamp
from (
select
s.*,
row_number() over(partition by user_id order by timestamp) rn1,
row_number() over(partition by user_id, color order by timestamp) rn2
from survey_daily s
) t
where color = 'orange'
group by user_id, rn1 - rn2
order by user_id, start_timestamp
这会为每个用户的每一系列相邻橙色记录生成一条记录:
user_id | no_records | start_timestamp | max_timestamp :------ | ---------: | :------------------ | :------------------ 3236 | 1 | 2020-05-12 02:40:59 | 2020-05-12 02:40:59 3236 | 3 | 2020-05-14 02:40:59 | 2020-05-16 02:40:59 3237 | 2 | 2020-05-15 02:40:59 | 2020-05-16 02:40:59
如果您只想要每个用户的最长连胜记录,您可以在此基础上使用聚合,或者再次使用 window 函数:
select *
from (
select
user_id,
count(*) no_records,
min(timestamp) start_timestamp,
max(timestamp) max_timestamp,
row_number() over(partition by user_id order by count(*) desc) rn
from (
select
s.*,
row_number() over(partition by user_id order by timestamp) rn1,
row_number() over(partition by user_id, color order by timestamp) rn2
from survey_daily s
) t
where color = 'ORANGE'
group by user_id, rn1 - rn2
) t
where rn = 1
order by user_id, start_timestamp
user_id | no_records | start_timestamp | max_timestamp | rn :------ | ---------: | :------------------ | :------------------ | -: 3236 | 3 | 2020-05-14 02:40:59 | 2020-05-16 02:40:59 | 1 3237 | 2 | 2020-05-15 02:40:59 | 2020-05-16 02:40:59 | 1
SELECT t1.user_id, MAX(1 + DATEDIFF(t2.`timestamp`, t1.`timestamp`)) max_delta
FROM survey_daily t1
JOIN survey_daily t2 ON t1.user_id = t2.user_id
WHERE t1.color = 'ORANGE'
AND t2.color = 'ORANGE'
AND t1.`timestamp` <= t2.`timestamp`
AND NOT EXISTS ( SELECT NULL
FROM survey_daily t3
WHERE t1.user_id = t3.user_id
AND t3.color != 'ORANGE'
AND t1.`timestamp` < t3.`timestamp`
AND t3.`timestamp` < t2.`timestamp` )
GROUP BY t1.user_id;
逻辑。获取用户的所有记录对,其中两条记录的颜色均为橙色,并且 none 记录之间存在另一种颜色。计算每对中以天为单位的距离。获取最大间隙值。
fiddle (thanks to GMB 对于 fiddle 从中获取源数据脚本)。
PS。如果某些用户存在 none 橙色记录,则不会返回该用户。如果您也需要这样的用户,请获取 survey_daily
table 的副本,通过 user_id
将我的查询作为子查询左联接到它,然后从 table 获取用户并来自子查询的连续天数(用 COALESCE 函数包装它以将 NULL 值转换为零)。