需要 Gap 和 Island 的最后一个值
Need the last value of a Gap and Island
我现在有点卡住了,我在这个查询上取得了很好的进展,但我没有得到我期望的结果。我正在使用 BQ 标准 SQL。 Table 看起来像这样。
ID - 帐户的唯一 ID
阶段 - 它在客户生命周期中的位置
CreatedDate - 更新阶段的时间
EndDate - 阶段结束的时间
金额 - 该阶段的金额
ID
阶段
创建日期
结束日期
数量
egj3k
发现
2020-12-08
2020-12-08
5000
egj3k
发现
2020-12-08
2020-12-10
6500
egj3k
提案
2020-12-10
2020-12-11
6500
egj3k
提案
2020-12-11
2020-12-15
8000
egj3k
协商
2020-12-15
2020-12-21
7500
egj3k
入职
2020-12-21
2020-12-21
8000
egj3k
入职
2020-12-21
2020-12-22
8000
egj3k
入职
2020-12-21
2020-12-23
10000
egj3k
入职
2020-12-23
2020-12-25
11000
egj3k
收入
2020-12-25
2021-01-31
15000
egj3k
停滞不前
2021-01-31
2021-02-05
7000
egj3k
收入
2021-02-05
2021-03-05
12000
第一个问题是找到每个阶段的最小和最大创建日期,然后显示生命周期,我是这样解决的:
WITH CTE AS
(
SELECT ID,
Stage,
CreatedDate,
EndDate,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY CreatedDate)
- ROW_NUMBER() OVER(PARTITION BY ID, Stage ORDER BY CreatedDate) As WindowId,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY CreatedDate DESC)
- ROW_NUMBER() OVER(PARTITION BY ID, Stage ORDER BY CreatedDate DESC) As ReverseWindowId
FROM `BQ_TABLE`
)
SELECT DISTINCT ID,
Stage,
MIN(CreatedDate) OVER(PARTITION BY WindowId, ReverseWindowId) As StartDate,
NULLIF(MAX(IFNULL(EndDate, '9999-12-31')) OVER(PARTITION BY WindowId, ReverseWindowId), '9999-12-31') As EndDate
FROM CTE
where ID = 'egj3k'
ORDER BY StartDate
产生这个:
ID
阶段
创建日期
结束日期
egj3k
发现
2020-12-08
2020-12-10
egj3k
提案
2020-12-10
2020-12-15
egj3k
协商
2020-12-15
2020-12-21
egj3k
入职
2020-12-21
2020-12-25
egj3k
收入
2020-12-25
2021-01-31
egj3k
停滞不前
2021-01-31
2021-02-05
egj3k
收入
2021-02-05
2021-03-05
完美。但现在我需要找到 window 的最后数量。所以我正在尝试制作这个:
ID
阶段
创建日期
结束日期
数量
egj3k
发现
2020-12-08
2020-12-10
6500
egj3k
提案
2020-12-10
2020-12-15
8000
egj3k
协商
2020-12-15
2020-12-21
7500
egj3k
入职
2020-12-21
2020-12-25
11000
egj3k
收入
2020-12-25
2021-01-31
15000
egj3k
停滞不前
2021-01-31
2021-02-05
7000
egj3k
收入
2021-02-05
2021-03-05
12000
我尝试对 window 和 select 分别进行排名。我也尝试使用 LAST_VALUE 但它没有正确返回。因为您可以看到 Revenue 有两个分类。
rank() OVER (PARTITION BY ID, Stage order by CreatedDate Desc) as Rank_
Where Rank_ = 1
帮帮我? ;) 有更好的方法吗?
WITH CTE AS
(
SELECT ID,
Stage,
CreatedDate,
EndDate,
Amount,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY CreatedDate)
- ROW_NUMBER() OVER(PARTITION BY ID, Stage ORDER BY CreatedDate) As WindowId,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY CreatedDate DESC)
- ROW_NUMBER() OVER(PARTITION BY ID, Stage ORDER BY CreatedDate DESC) As ReverseWindowId
FROM `BQ_Table`
),
CTE_2 as (
SELECT ID,
Stage,
MIN(CreatedDate) OVER(PARTITION BY WindowId, ReverseWindowId) As StartDate,
NULLIF(MAX(IFNULL(EndDate, '9999-12-31')) OVER(PARTITION BY WindowId, ReverseWindowId), '9999-12-31') As EndDate,
Amount,
row_number() OVER (PARTITION BY ReverseWindowId, WindowId ORDER BY EndDate desc) as Rev_rank
FROM CTE
ORDER BY StartDate)
SELECT
ID,
Stage,
StartDate,
EndDate,
Amount
FROM CTE_2
Where Rev_rank = 1
它可以工作,但是很丑。仍然有兴趣看看是否有人有更好的主意
您可以使用数组技巧来获取每个岛上的最后一个值。我觉得你的逻辑相当繁琐。我不确定您为什么使用 ROW_NUMBER()
四次,为什么要检查 CreatedDate
的 NULL
值,也不知道为什么使用 SELECT DISTINCT
而不是 GROUP BY
.
无论如何,这应该可以满足您的要求:
WITH CTE AS (
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CreatedDate) as seqnum,
ROW_NUMBER() OVER (PARTITION BY ID, Stage ORDER BY CreatedDate) as seqnum_2
FROM `BQ_TABLE` t
)
SELECT ID, Stage, MIN(CreatedDate), MAX(CreatedDate),
ARRAY_AGG(Amount ORDR BY CreatedDate DESC LIMIT 1)[SAFE_ORDINAL(1)]
FROM cte
GROUP BY ID, Stage, (seqnum - seqnum_2);
考虑以下方法
select id, stage, min(CreatedDate) CreatedDate, max(EndDate) EndDate,
array_agg(amount order by EndDate desc limit 1)[offset(0)] amount
from (
select *, countif(new_stage) over(partition by id order by CreatedDate, EndDate) stage_group
from (
select *,
ifnull(stage != lag(stage) over(partition by id order by CreatedDate, EndDate), true) new_stage
from `BQ_Table`
)
)
group by id, stage, stage_group
如果应用于您问题中的示例数据 - 输出为
我现在有点卡住了,我在这个查询上取得了很好的进展,但我没有得到我期望的结果。我正在使用 BQ 标准 SQL。 Table 看起来像这样。
ID - 帐户的唯一 ID 阶段 - 它在客户生命周期中的位置 CreatedDate - 更新阶段的时间 EndDate - 阶段结束的时间 金额 - 该阶段的金额
ID | 阶段 | 创建日期 | 结束日期 | 数量 |
---|---|---|---|---|
egj3k | 发现 | 2020-12-08 | 2020-12-08 | 5000 |
egj3k | 发现 | 2020-12-08 | 2020-12-10 | 6500 |
egj3k | 提案 | 2020-12-10 | 2020-12-11 | 6500 |
egj3k | 提案 | 2020-12-11 | 2020-12-15 | 8000 |
egj3k | 协商 | 2020-12-15 | 2020-12-21 | 7500 |
egj3k | 入职 | 2020-12-21 | 2020-12-21 | 8000 |
egj3k | 入职 | 2020-12-21 | 2020-12-22 | 8000 |
egj3k | 入职 | 2020-12-21 | 2020-12-23 | 10000 |
egj3k | 入职 | 2020-12-23 | 2020-12-25 | 11000 |
egj3k | 收入 | 2020-12-25 | 2021-01-31 | 15000 |
egj3k | 停滞不前 | 2021-01-31 | 2021-02-05 | 7000 |
egj3k | 收入 | 2021-02-05 | 2021-03-05 | 12000 |
第一个问题是找到每个阶段的最小和最大创建日期,然后显示生命周期,我是这样解决的:
WITH CTE AS
(
SELECT ID,
Stage,
CreatedDate,
EndDate,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY CreatedDate)
- ROW_NUMBER() OVER(PARTITION BY ID, Stage ORDER BY CreatedDate) As WindowId,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY CreatedDate DESC)
- ROW_NUMBER() OVER(PARTITION BY ID, Stage ORDER BY CreatedDate DESC) As ReverseWindowId
FROM `BQ_TABLE`
)
SELECT DISTINCT ID,
Stage,
MIN(CreatedDate) OVER(PARTITION BY WindowId, ReverseWindowId) As StartDate,
NULLIF(MAX(IFNULL(EndDate, '9999-12-31')) OVER(PARTITION BY WindowId, ReverseWindowId), '9999-12-31') As EndDate
FROM CTE
where ID = 'egj3k'
ORDER BY StartDate
产生这个:
ID | 阶段 | 创建日期 | 结束日期 |
---|---|---|---|
egj3k | 发现 | 2020-12-08 | 2020-12-10 |
egj3k | 提案 | 2020-12-10 | 2020-12-15 |
egj3k | 协商 | 2020-12-15 | 2020-12-21 |
egj3k | 入职 | 2020-12-21 | 2020-12-25 |
egj3k | 收入 | 2020-12-25 | 2021-01-31 |
egj3k | 停滞不前 | 2021-01-31 | 2021-02-05 |
egj3k | 收入 | 2021-02-05 | 2021-03-05 |
完美。但现在我需要找到 window 的最后数量。所以我正在尝试制作这个:
ID | 阶段 | 创建日期 | 结束日期 | 数量 |
---|---|---|---|---|
egj3k | 发现 | 2020-12-08 | 2020-12-10 | 6500 |
egj3k | 提案 | 2020-12-10 | 2020-12-15 | 8000 |
egj3k | 协商 | 2020-12-15 | 2020-12-21 | 7500 |
egj3k | 入职 | 2020-12-21 | 2020-12-25 | 11000 |
egj3k | 收入 | 2020-12-25 | 2021-01-31 | 15000 |
egj3k | 停滞不前 | 2021-01-31 | 2021-02-05 | 7000 |
egj3k | 收入 | 2021-02-05 | 2021-03-05 | 12000 |
我尝试对 window 和 select 分别进行排名。我也尝试使用 LAST_VALUE 但它没有正确返回。因为您可以看到 Revenue 有两个分类。
rank() OVER (PARTITION BY ID, Stage order by CreatedDate Desc) as Rank_
Where Rank_ = 1
帮帮我? ;) 有更好的方法吗?
WITH CTE AS
(
SELECT ID,
Stage,
CreatedDate,
EndDate,
Amount,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY CreatedDate)
- ROW_NUMBER() OVER(PARTITION BY ID, Stage ORDER BY CreatedDate) As WindowId,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY CreatedDate DESC)
- ROW_NUMBER() OVER(PARTITION BY ID, Stage ORDER BY CreatedDate DESC) As ReverseWindowId
FROM `BQ_Table`
),
CTE_2 as (
SELECT ID,
Stage,
MIN(CreatedDate) OVER(PARTITION BY WindowId, ReverseWindowId) As StartDate,
NULLIF(MAX(IFNULL(EndDate, '9999-12-31')) OVER(PARTITION BY WindowId, ReverseWindowId), '9999-12-31') As EndDate,
Amount,
row_number() OVER (PARTITION BY ReverseWindowId, WindowId ORDER BY EndDate desc) as Rev_rank
FROM CTE
ORDER BY StartDate)
SELECT
ID,
Stage,
StartDate,
EndDate,
Amount
FROM CTE_2
Where Rev_rank = 1
它可以工作,但是很丑。仍然有兴趣看看是否有人有更好的主意
您可以使用数组技巧来获取每个岛上的最后一个值。我觉得你的逻辑相当繁琐。我不确定您为什么使用 ROW_NUMBER()
四次,为什么要检查 CreatedDate
的 NULL
值,也不知道为什么使用 SELECT DISTINCT
而不是 GROUP BY
.
无论如何,这应该可以满足您的要求:
WITH CTE AS (
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CreatedDate) as seqnum,
ROW_NUMBER() OVER (PARTITION BY ID, Stage ORDER BY CreatedDate) as seqnum_2
FROM `BQ_TABLE` t
)
SELECT ID, Stage, MIN(CreatedDate), MAX(CreatedDate),
ARRAY_AGG(Amount ORDR BY CreatedDate DESC LIMIT 1)[SAFE_ORDINAL(1)]
FROM cte
GROUP BY ID, Stage, (seqnum - seqnum_2);
考虑以下方法
select id, stage, min(CreatedDate) CreatedDate, max(EndDate) EndDate,
array_agg(amount order by EndDate desc limit 1)[offset(0)] amount
from (
select *, countif(new_stage) over(partition by id order by CreatedDate, EndDate) stage_group
from (
select *,
ifnull(stage != lag(stage) over(partition by id order by CreatedDate, EndDate), true) new_stage
from `BQ_Table`
)
)
group by id, stage, stage_group
如果应用于您问题中的示例数据 - 输出为