如何使用 row_number 计算所需锚定日期之后的日期?
How to use row_number to count dates after a desired anchor date?
我有一个数据集,我想在其中计算锚定日期前后的行数。我认为使用 row_number()
的 window 函数可以工作,但我不确定它会如何编写。
我现在的table:
order_id contact_id placed_at anchor_date
13236647 123456 2020-06-24T12:47:18
16253983 123456 2020-07-19T05:54:52
16720335 123456 2020-08-20T02:02:06
17823059 123456 2020-09-17T02:02:04 2020-09-17T02:02:04
18523920 123456 2020-10-12T13:53:19
19324467 123456 2020-11-12T01:02:18
20234536 123456 2020-12-04T01:02:42
70523487 654321 2015-09-21T09:25:25
71234048 654321 2015-10-01T19:02:28
14145443 654321 2020-03-28T10:21:57
14134525 654321 2020-03-28T10:31:33
11244748 654321 2020-04-03T06:20:57 2020-04-03T06:20:57
我想要的输出如下所示:
rows_before_anchor
对 anchor_date
之前的所有行进行编号,按 placed_at
排序并按 contact_id
.
分组
rows_after_anchor
对 anchor_date
之后的所有行进行编号,按 placed_at
排序,按 contact_id
分组
这是我的尝试:
SELECT
order_id,
contact_id,
placed_at,
ROW_NUMBER() OVER (PARTITION BY contact_id ORDER BY placed_at < anchor_date) AS rows_before_anchor,
ROW_NUMBER() OVER (PARTITION BY contact_id ORDER BY placed_at > anchor_date) AS rows_after_anchor
FROM mytable
我想要的table:
order_id contact_id placed_at anchor_date rows_before_anchor rows_after_anchor
13236647 123456 2020-06-24T12:47:18 1
16253983 123456 2020-07-19T05:54:52 2
16720335 123456 2020-08-20T02:02:06 3
17823059 123456 2020-09-17T02:02:04 2020-09-17T02:02:04
18523920 123456 2020-10-12T13:53:19 1
19324467 123456 2020-11-12T01:02:18 2
20234536 123456 2020-12-04T01:02:42 3
70523487 654321 2015-09-21T09:25:25 1
71234048 654321 2015-10-01T19:02:28 2
14145443 654321 2020-03-28T10:21:57 3
14134525 654321 2020-03-28T10:31:33 4
11244748 654321 2020-04-03T06:20:57 2020-04-03T06:20:57
这是您可以执行此操作的一种方法。首先,您需要确定锚定日期两侧的所有行,并为它们分配一个公共 grouping,这在下面的 CTE 中完成。一旦你有了这个分组,你就可以通过将它作为一个分区来使用它来应用所需的编号。
从您的示例数据中不清楚行编号应该为零还是空字符串,因为根据定义行号是整数我已将 blank 值默认为零 - 如果你真的想要空白,那么只需将行号转换为 varchar。
with grp as (
select *,
Row_Number() over(partition by contact_id order by placed_at)
- Row_Number() over(partition by contact_id, anchor_date order by placed_at) gnum
from t
)
select order_id, contact_id, placed_at, anchor_date,
case when anchor_date is null and gnum=0 then
Row_Number() over(partition by contact_id, gnum order by placed_at)
else 0 end as rows_before_anchor,
case when anchor_date is null and gnum>0 then
Row_Number() over(partition by contact_id, gnum order by placed_at)
else 0 end as rows_after_anchor
from grp
order by contact_id, placed_at;
据我所知,Amazon Redshift 没有 Fiddle,但看到这个 example DB<>Fiddle 使用 SQL 服务器,它应该具有相同或相似的语法。
我有一个数据集,我想在其中计算锚定日期前后的行数。我认为使用 row_number()
的 window 函数可以工作,但我不确定它会如何编写。
我现在的table:
order_id contact_id placed_at anchor_date
13236647 123456 2020-06-24T12:47:18
16253983 123456 2020-07-19T05:54:52
16720335 123456 2020-08-20T02:02:06
17823059 123456 2020-09-17T02:02:04 2020-09-17T02:02:04
18523920 123456 2020-10-12T13:53:19
19324467 123456 2020-11-12T01:02:18
20234536 123456 2020-12-04T01:02:42
70523487 654321 2015-09-21T09:25:25
71234048 654321 2015-10-01T19:02:28
14145443 654321 2020-03-28T10:21:57
14134525 654321 2020-03-28T10:31:33
11244748 654321 2020-04-03T06:20:57 2020-04-03T06:20:57
我想要的输出如下所示:
rows_before_anchor
对 anchor_date
之前的所有行进行编号,按 placed_at
排序并按 contact_id
.
rows_after_anchor
对 anchor_date
之后的所有行进行编号,按 placed_at
排序,按 contact_id
这是我的尝试:
SELECT
order_id,
contact_id,
placed_at,
ROW_NUMBER() OVER (PARTITION BY contact_id ORDER BY placed_at < anchor_date) AS rows_before_anchor,
ROW_NUMBER() OVER (PARTITION BY contact_id ORDER BY placed_at > anchor_date) AS rows_after_anchor
FROM mytable
我想要的table:
order_id contact_id placed_at anchor_date rows_before_anchor rows_after_anchor
13236647 123456 2020-06-24T12:47:18 1
16253983 123456 2020-07-19T05:54:52 2
16720335 123456 2020-08-20T02:02:06 3
17823059 123456 2020-09-17T02:02:04 2020-09-17T02:02:04
18523920 123456 2020-10-12T13:53:19 1
19324467 123456 2020-11-12T01:02:18 2
20234536 123456 2020-12-04T01:02:42 3
70523487 654321 2015-09-21T09:25:25 1
71234048 654321 2015-10-01T19:02:28 2
14145443 654321 2020-03-28T10:21:57 3
14134525 654321 2020-03-28T10:31:33 4
11244748 654321 2020-04-03T06:20:57 2020-04-03T06:20:57
这是您可以执行此操作的一种方法。首先,您需要确定锚定日期两侧的所有行,并为它们分配一个公共 grouping,这在下面的 CTE 中完成。一旦你有了这个分组,你就可以通过将它作为一个分区来使用它来应用所需的编号。
从您的示例数据中不清楚行编号应该为零还是空字符串,因为根据定义行号是整数我已将 blank 值默认为零 - 如果你真的想要空白,那么只需将行号转换为 varchar。
with grp as (
select *,
Row_Number() over(partition by contact_id order by placed_at)
- Row_Number() over(partition by contact_id, anchor_date order by placed_at) gnum
from t
)
select order_id, contact_id, placed_at, anchor_date,
case when anchor_date is null and gnum=0 then
Row_Number() over(partition by contact_id, gnum order by placed_at)
else 0 end as rows_before_anchor,
case when anchor_date is null and gnum>0 then
Row_Number() over(partition by contact_id, gnum order by placed_at)
else 0 end as rows_after_anchor
from grp
order by contact_id, placed_at;
据我所知,Amazon Redshift 没有 Fiddle,但看到这个 example DB<>Fiddle 使用 SQL 服务器,它应该具有相同或相似的语法。