添加变量以计算连续月份
Add variable to count consecutive months
我在结合了客户端订阅的 Postgres 数据库中有一个查询。
我想添加一个名为“连续月份”的变量,但我不确定如何在 Postgres 中添加它。
我的原始 table 是这样的:
client
product
Date
1
Sub
2020-10-01
1
Sub
2020-11-01
2
Sub
2020-11-01
2
Sub
2020-12-01
1
Sub
2021-01-01
1
Sub
2021-02-01
2
Sub
2021-02-01
而且我希望有一些东西可以计算连续几个月的起源,如下所示:
client
product
Date
Consecutive_months
1
Sub
2020-10-01
1
1
Sub
2020-11-01
2
2
Sub
2020-11-01
1
2
Sub
2020-12-01
2
1
Sub
2021-01-01
1
1
Sub
2021-02-01
2
2
Sub
2021-02-01
1
感谢您的帮助!
基于标签 OP 显然意识到这是一个差距和孤岛问题。此查询提取月份和年份信息以生成按月递增的序列。之后只需要使用标准差分逻辑来找到不同步的行并创建离岛标记。
with A as (
select *,
date_part('year', dt) * 12 + date_part('month', dt)
- row_number() over (partition by client, product order by dt) as grp
from T
)
select *,
row_number()
over (partition by client, product, grp order by dt) as consecutive_months
from A;
如果对于给定的客户产品在同一个月内有多个行是可以接受的,那么在两个地方都将 row_number()
切换为 dense_rank()
。
https://dbfiddle.uk/?rdbms=postgres_9.5&fiddle=397a2f3282cab3b70bd7a47d1dc5ea0a
看来您遇到了 Gaps-And-Islands 类型的问题。
诀窍是根据每个客户的连接日期计算一些排名。
然后根据client和rank可以算出一个序号
select client, product, "Date"
, row_number() over (partition by client, daterank order by "Date") as Consecutive_months
from
(
select "Date", client, product
, dense_rank() over (partition by client order by "Date")
+ (DATE_PART('year', AGE(current_date, "Date"))*12 +
DATE_PART('month', AGE(current_date, "Date"))) daterank
from raw t
) q
order by "Date", client
client | product | Date | consecutive_months
-----: | :------ | :--------- | -----------------:
1 | Sub | 2020-10-01 | 1
1 | Sub | 2020-11-01 | 2
2 | Sub | 2020-11-01 | 1
2 | Sub | 2020-12-01 | 2
1 | Sub | 2021-01-01 | 1
1 | Sub | 2021-02-01 | 2
2 | Sub | 2021-02-01 | 1
db<>fiddle here
我在结合了客户端订阅的 Postgres 数据库中有一个查询。
我想添加一个名为“连续月份”的变量,但我不确定如何在 Postgres 中添加它。
我的原始 table 是这样的:
client | product | Date |
---|---|---|
1 | Sub | 2020-10-01 |
1 | Sub | 2020-11-01 |
2 | Sub | 2020-11-01 |
2 | Sub | 2020-12-01 |
1 | Sub | 2021-01-01 |
1 | Sub | 2021-02-01 |
2 | Sub | 2021-02-01 |
而且我希望有一些东西可以计算连续几个月的起源,如下所示:
client | product | Date | Consecutive_months |
---|---|---|---|
1 | Sub | 2020-10-01 | 1 |
1 | Sub | 2020-11-01 | 2 |
2 | Sub | 2020-11-01 | 1 |
2 | Sub | 2020-12-01 | 2 |
1 | Sub | 2021-01-01 | 1 |
1 | Sub | 2021-02-01 | 2 |
2 | Sub | 2021-02-01 | 1 |
感谢您的帮助!
基于标签 OP 显然意识到这是一个差距和孤岛问题。此查询提取月份和年份信息以生成按月递增的序列。之后只需要使用标准差分逻辑来找到不同步的行并创建离岛标记。
with A as (
select *,
date_part('year', dt) * 12 + date_part('month', dt)
- row_number() over (partition by client, product order by dt) as grp
from T
)
select *,
row_number()
over (partition by client, product, grp order by dt) as consecutive_months
from A;
如果对于给定的客户产品在同一个月内有多个行是可以接受的,那么在两个地方都将 row_number()
切换为 dense_rank()
。
https://dbfiddle.uk/?rdbms=postgres_9.5&fiddle=397a2f3282cab3b70bd7a47d1dc5ea0a
看来您遇到了 Gaps-And-Islands 类型的问题。
诀窍是根据每个客户的连接日期计算一些排名。
然后根据client和rank可以算出一个序号
select client, product, "Date" , row_number() over (partition by client, daterank order by "Date") as Consecutive_months from ( select "Date", client, product , dense_rank() over (partition by client order by "Date") + (DATE_PART('year', AGE(current_date, "Date"))*12 + DATE_PART('month', AGE(current_date, "Date"))) daterank from raw t ) q order by "Date", client
client | product | Date | consecutive_months -----: | :------ | :--------- | -----------------: 1 | Sub | 2020-10-01 | 1 1 | Sub | 2020-11-01 | 2 2 | Sub | 2020-11-01 | 1 2 | Sub | 2020-12-01 | 2 1 | Sub | 2021-01-01 | 1 1 | Sub | 2021-02-01 | 2 2 | Sub | 2021-02-01 | 1
db<>fiddle here