如何计算用户在 MYSQL 中连续多少天被标记为橙色?

How to count how many consecutive days a user is tagged as orange in MYSQL?

我想知道如何计算用户被标记为橙色的连续天数到今天为止。我有以下


CREATE TABLE `survey_daily` (
  `id` int(11) NOT NULL,
  `user_id` varchar(30) NOT NULL,
  `color` varchar(10) NOT NULL,
  `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;


INSERT INTO `survey_daily` (`id`, `user_id`, `color`, `timestamp`) VALUES
(1, '3236', "ORANGE", '2020-05-12 02:40:59'),
(2, '3236', "WHITE", '2020-05-13 02:40:59'),
(3, '3236', "ORANGE", '2020-05-14 02:40:59'),
(4, '3236', "ORANGE", '2020-05-15 02:40:59'),
(5, '3237', "ORANGE", '2020-05-15 02:40:59'),
(6, '3237', "ORANGE", '2020-05-16 02:40:59'),
(7, '3236', "ORANGE", '2020-05-16 02:40:59');

Fiddle: http://sqlfiddle.com/#!9/40cb26/1.

基本上我在 table 上有多个用户,我想计算一个用户被标记为橙色的连续天数。

在我的示例中,用户 ID 3236 应连续 3 天为橙色,而用户 3237 应有 2 天标记为橙色,直到今天。如果他们中的 none 今天没有记录,它将 return 到 0.

谢谢

这是一个缺口和孤岛问题。如果你是运行 MySQL 8.0,一种做法是利用row_numbers()之间的差异,将某用户颜色相同的连续记录构建成组,然后聚合:

select 
    user_id, 
    count(*) no_records, 
    min(timestamp) start_timestamp, 
    max(timestamp) max_timestamp
from (
    select 
        s.*,
        row_number() over(partition by user_id order by timestamp) rn1,
        row_number() over(partition by user_id, color order by timestamp) rn2
    from survey_daily s
) t
where color = 'orange'
group by user_id, rn1 - rn2
order by user_id, start_timestamp

这会为每个用户的每一系列相邻橙色记录生成一条记录:

user_id | no_records | start_timestamp     | max_timestamp      
:------ | ---------: | :------------------ | :------------------
3236    |          1 | 2020-05-12 02:40:59 | 2020-05-12 02:40:59
3236    |          3 | 2020-05-14 02:40:59 | 2020-05-16 02:40:59
3237    |          2 | 2020-05-15 02:40:59 | 2020-05-16 02:40:59

如果您只想要每个用户的最长连胜记录,您可以在此基础上使用聚合,或者再次使用 window 函数:

select *
from (
    select 
        user_id, 
        count(*) no_records, 
        min(timestamp) start_timestamp, 
        max(timestamp) max_timestamp,
        row_number() over(partition by user_id order by count(*) desc) rn
    from (
        select 
            s.*,
            row_number() over(partition by user_id order by timestamp) rn1,
            row_number() over(partition by user_id, color order by timestamp) rn2
        from survey_daily s
    ) t
    where color = 'ORANGE'
    group by user_id, rn1 - rn2
) t
where rn = 1
order by user_id, start_timestamp
user_id | no_records | start_timestamp     | max_timestamp       | rn
:------ | ---------: | :------------------ | :------------------ | -:
3236    |          3 | 2020-05-14 02:40:59 | 2020-05-16 02:40:59 |  1
3237    |          2 | 2020-05-15 02:40:59 | 2020-05-16 02:40:59 |  1

Demo on DB Fiddle

SELECT t1.user_id, MAX(1 + DATEDIFF(t2.`timestamp`, t1.`timestamp`)) max_delta
FROM survey_daily t1
JOIN survey_daily t2 ON t1.user_id = t2.user_id
WHERE t1.color = 'ORANGE'
  AND t2.color = 'ORANGE'
  AND t1.`timestamp` <= t2.`timestamp`
  AND NOT EXISTS ( SELECT NULL
                   FROM survey_daily t3
                   WHERE t1.user_id = t3.user_id
                     AND t3.color != 'ORANGE'
                     AND t1.`timestamp` < t3.`timestamp`
                     AND t3.`timestamp` < t2.`timestamp` )
GROUP BY t1.user_id;

逻辑。获取用户的所有记录对,其中两条记录的颜色均为橙色,并且 none 记录之间存在另一种颜色。计算每对中以天为单位的距离。获取最大间隙值。

fiddle (thanks to GMB 对于 fiddle 从中获取源数据脚本)。

PS。如果某些用户存在 none 橙色记录,则不会返回该用户。如果您也需要这样的用户,请获取 survey_daily table 的副本,通过 user_id 将我的查询作为子查询左联接到它,然后从 table 获取用户并来自子查询的连续天数(用 COALESCE 函数包装它以将 NULL 值转换为零)。