如何确定记录(对于特定集合)是否遵循正确的顺序?

How to determine if records (for a certain set) follow a correct order?

我有一个 table 包含 3 个不同的状态'

1. CLICKED
2. CLAIMED
3. BOUGHT

按照这个特定的顺序。我正在尝试确定是否有任何记录未根据日期以正确的顺序出现。

例如,这是数据:

enter image description here

Record 121144 has correct order status, this is good.
Record 121200 is incorrect since bought happens before clicked even if clicked and claimed follow the right order.
Record 121122 is incorrect, since CLICKED status comes after CLAIMED.
Record 121111 also has correct order status (even if they are the same).
Record 121198 is also correct since the status order follows, even if there is no BOUGHT.
CREATE TABLE TBL_A 
(
    number_id int, 
    country varchar(50), 
    status varchar(50), 
    datetime date
);

INSERT INTO TBL_A 
VALUES (121144, 'USA', 'CLICKED', '2021-10-09'),
       (121144, 'USA', 'CLAIMED', '2021-10-10'),
       (121144, 'USA', 'BOUGHT', '2021-10-11'),
       (121111, 'CAD', 'CLICKED', '2021-10-12'),
       (121111, 'CAD', 'CLAIMED', '2021-10-12'),
       (121111, 'CAD', 'BOUGHT', '2021-10-12'),
       (121122, 'PES', 'CLICKED', '2021-09-11'),
       (121122, 'PES', 'CLAIMED', '2021-09-09'),
       (121122, 'PES', 'BOUGHT', '2021-09-12'),
       (121198, 'AU', 'CLICKED', '2021-09-11'),
       (121198, 'AU', 'CLAIMED', '2021-09-12'),
       (121200, 'POR', 'CLICKED', '2021-09-10'),
       (121200, 'POR', 'CLAIMED', '2021-09-11'),
       (121200, 'POR', 'BOUGHT', '2021-09-08');

这是使用一些字符串聚合和操作的一种方法。这对示例数据按预期工作,并且还考虑了包括跳过状态、缺失状态和单一状态在内的边缘情况。

with cte as

(select *,listagg(status,'>') within group (order by datetime,charindex(status,'CLICKED>CLAIMED>BOUGHT')) over (partition by number_id, country) as event_order
from t)

select distinct 
       number_id,
       country, 
       case when charindex(event_order,'CLICKED>CLAIMED>BOUGHT,CLICKED>BOUGHT')>0 then 'Ordered' else 'Unordered' end as order_flag
from cte
order by number_id;

   

我的回答包括 OP 在评论中提到的可能跳过的步骤。这种方法不是使用严格匹配的序列,而是寻找相邻对,其中前面的步骤编号更高:

with A as (
    select *,        
        case status
            when 'CLICKED' then 1
            when 'CLAIMED' then 2
            when 'BOUGHT'  then 3 end as desired_order
    from T
), B as (
    select *,
        row_number() over (
            partition by number_id
            order by datetime, desired_order) as rn -- handles date ties
    from A
), C as (
    select *,
        -- look for pairs of rows where one is reversed
        case when lag(desired_order) over (partition by number_id order by rn) >
            desired_order then 'Y' end as flag
    from B
)
select number_id, min(country) as country,
    case min(flag) when 'Y' then 'Out of order' else 'In order' end as "status"
from C
group by number_id;

https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=f0ee1de8e8e81229ddc23acc97bce7d7

正如 Thorston 指出的那样,您还可以采用生成一对行号然后比较两者以查找不匹配的方法。查看查询计划,这可能涉及额外的排序操作,因此值得对您的数据尝试两种方式。

...
), B as (
    select *,
        row_number() over (
            partition by number_id
            order by desired_order) as rn1,
        row_number() over (
            partition by number_id
            order by datetime, desired_order) as rn2
    from A
)
select
  number_id, min(country) as country,
  case when max(case when rn1 <> rn2 then 1 else 0 end) = 1
    then 'Out of order' else 'In order' end as status
...

使用 ARRAY_AGG 按日期时间排序:

SELECT number_id, 
    ARRAY_AGG(status) WITHIN GROUP(ORDER BY datetime) AS statuses, -- debug
    CASE WHEN ARRAY_AGG(status) WITHIN GROUP(ORDER BY datetime) 
         IN (ARRAY_CONSTRUCT('CLICKED', 'CLAIMED', 'BOUGHT'),
             ARRAY_CONSTRUCT('CLICKED', 'CLAIMED')) THEN 'In order'
         ELSE 'Out of order'
    END AS status
FROM TBL_A
GROUP BY number_id;

输出: