如何使用 row_number 计算所需锚定日期之后的日期?

How to use row_number to count dates after a desired anchor date?

我有一个数据集,我想在其中计算锚定日期前后的行数。我认为使用 row_number() 的 window 函数可以工作,但我不确定它会如何编写。

我现在的table:

order_id    contact_id    placed_at              anchor_date
13236647    123456        2020-06-24T12:47:18   
16253983    123456        2020-07-19T05:54:52   
16720335    123456        2020-08-20T02:02:06   
17823059    123456        2020-09-17T02:02:04    2020-09-17T02:02:04
18523920    123456        2020-10-12T13:53:19   
19324467    123456        2020-11-12T01:02:18   
20234536    123456        2020-12-04T01:02:42   
70523487    654321        2015-09-21T09:25:25   
71234048    654321        2015-10-01T19:02:28   
14145443    654321        2020-03-28T10:21:57   
14134525    654321        2020-03-28T10:31:33   
11244748    654321        2020-04-03T06:20:57    2020-04-03T06:20:57

我想要的输出如下所示:

rows_before_anchoranchor_date 之前的所有行进行编号,按 placed_at 排序并按 contact_id.

分组

rows_after_anchoranchor_date 之后的所有行进行编号,按 placed_at 排序,按 contact_id

分组

这是我的尝试:

SELECT
  order_id,
  contact_id,
  placed_at,
  ROW_NUMBER() OVER (PARTITION BY contact_id ORDER BY placed_at < anchor_date) AS rows_before_anchor,
  ROW_NUMBER() OVER (PARTITION BY contact_id ORDER BY placed_at > anchor_date) AS rows_after_anchor
FROM mytable

我想要的table:

order_id  contact_id  placed_at            anchor_date         rows_before_anchor rows_after_anchor
13236647  123456      2020-06-24T12:47:18                      1                  
16253983  123456      2020-07-19T05:54:52                      2    
16720335  123456      2020-08-20T02:02:06                      3    
17823059  123456      2020-09-17T02:02:04  2020-09-17T02:02:04
18523920  123456      2020-10-12T13:53:19                                         1
19324467  123456      2020-11-12T01:02:18                                         2             
20234536  123456      2020-12-04T01:02:42                                         3
70523487  654321      2015-09-21T09:25:25                      1
71234048  654321      2015-10-01T19:02:28                      2            
14145443  654321      2020-03-28T10:21:57                      3
14134525  654321      2020-03-28T10:31:33                      4
11244748  654321      2020-04-03T06:20:57  2020-04-03T06:20:57

这是您可以执行此操作的一种方法。首先,您需要确定锚定日期两侧的所有行,并为它们分配一个公共 grouping,这在下面的 CTE 中完成。一旦你有了这个分组,你就可以通过将它作为一个分区来使用它来应用所需的编号。

从您的示例数据中不清楚行编号应该为零还是空字符串,因为根据定义行号是整数我已将 blank 值默认为零 - 如果你真的想要空白,那么只需将行号转换为 varchar。

with grp as (
    select *, 
        Row_Number() over(partition by contact_id order by placed_at)
        - Row_Number() over(partition by contact_id, anchor_date order by placed_at) gnum
    from t
)

select order_id, contact_id, placed_at, anchor_date,
    case when anchor_date is null and gnum=0 then
        Row_Number() over(partition by contact_id, gnum order by placed_at)
    else 0 end as rows_before_anchor,
    case when anchor_date is null and gnum>0 then
        Row_Number() over(partition by contact_id, gnum order by placed_at)
    else 0 end as rows_after_anchor
from grp
order by contact_id, placed_at;

据我所知,Amazon Redshift 没有 Fiddle,但看到这个 example DB<>Fiddle 使用 SQL 服务器,它应该具有相同或相似的语法。