需要 Gap 和 Island 的最后一个值

Need the last value of a Gap and Island

我现在有点卡住了,我在这个查询上取得了很好的进展,但我没有得到我期望的结果。我正在使用 BQ 标准 SQL。 Table 看起来像这样。

ID - 帐户的唯一 ID 阶段 - 它在客户生命周期中的位置 CreatedDate - 更新阶段的时间 EndDate - 阶段结束的时间 金额 - 该阶段的金额

ID 阶段 创建日期 结束日期 数量
egj3k 发现 2020-12-08 2020-12-08 5000
egj3k 发现 2020-12-08 2020-12-10 6500
egj3k 提案 2020-12-10 2020-12-11 6500
egj3k 提案 2020-12-11 2020-12-15 8000
egj3k 协商 2020-12-15 2020-12-21 7500
egj3k 入职 2020-12-21 2020-12-21 8000
egj3k 入职 2020-12-21 2020-12-22 8000
egj3k 入职 2020-12-21 2020-12-23 10000
egj3k 入职 2020-12-23 2020-12-25 11000
egj3k 收入 2020-12-25 2021-01-31 15000
egj3k 停滞不前 2021-01-31 2021-02-05 7000
egj3k 收入 2021-02-05 2021-03-05 12000

第一个问题是找到每个阶段的最小和最大创建日期,然后显示生命周期,我是这样解决的:

WITH CTE AS
    (
    SELECT  ID, 
            Stage, 
            CreatedDate, 
            EndDate,
            ROW_NUMBER() OVER(PARTITION BY ID ORDER BY CreatedDate)
            - ROW_NUMBER() OVER(PARTITION BY ID, Stage ORDER BY CreatedDate) As WindowId,
            ROW_NUMBER() OVER(PARTITION BY ID ORDER BY CreatedDate DESC)
            - ROW_NUMBER() OVER(PARTITION BY ID, Stage ORDER BY CreatedDate DESC) As ReverseWindowId
    FROM `BQ_TABLE`
    )
    
    SELECT  DISTINCT ID,
            Stage,
            MIN(CreatedDate) OVER(PARTITION BY WindowId, ReverseWindowId) As StartDate,
            NULLIF(MAX(IFNULL(EndDate, '9999-12-31')) OVER(PARTITION BY WindowId, ReverseWindowId), '9999-12-31') As EndDate
    
    FROM CTE 

    where ID = 'egj3k'
    ORDER BY StartDate

产生这个:

ID 阶段 创建日期 结束日期
egj3k 发现 2020-12-08 2020-12-10
egj3k 提案 2020-12-10 2020-12-15
egj3k 协商 2020-12-15 2020-12-21
egj3k 入职 2020-12-21 2020-12-25
egj3k 收入 2020-12-25 2021-01-31
egj3k 停滞不前 2021-01-31 2021-02-05
egj3k 收入 2021-02-05 2021-03-05

完美。但现在我需要找到 window 的最后数量。所以我正在尝试制作这个:

ID 阶段 创建日期 结束日期 数量
egj3k 发现 2020-12-08 2020-12-10 6500
egj3k 提案 2020-12-10 2020-12-15 8000
egj3k 协商 2020-12-15 2020-12-21 7500
egj3k 入职 2020-12-21 2020-12-25 11000
egj3k 收入 2020-12-25 2021-01-31 15000
egj3k 停滞不前 2021-01-31 2021-02-05 7000
egj3k 收入 2021-02-05 2021-03-05 12000

我尝试对 window 和 select 分别进行排名。我也尝试使用 LAST_VALUE 但它没有正确返回。因为您可以看到 Revenue 有两个分类。

rank() OVER (PARTITION BY ID, Stage order by CreatedDate Desc) as Rank_

Where Rank_ = 1

帮帮我? ;) 有更好的方法吗?

WITH CTE AS
    (
    SELECT  ID, 
            Stage, 
            CreatedDate, 
            EndDate,
            Amount,
            ROW_NUMBER() OVER(PARTITION BY ID ORDER BY CreatedDate)
            - ROW_NUMBER() OVER(PARTITION BY ID, Stage ORDER BY CreatedDate) As WindowId,
            ROW_NUMBER() OVER(PARTITION BY ID ORDER BY CreatedDate DESC)
            - ROW_NUMBER() OVER(PARTITION BY ID, Stage ORDER BY CreatedDate DESC) As ReverseWindowId
    FROM `BQ_Table`
    ),
    CTE_2 as (
    SELECT  ID,
            Stage,
            MIN(CreatedDate) OVER(PARTITION BY WindowId, ReverseWindowId) As StartDate,
            NULLIF(MAX(IFNULL(EndDate, '9999-12-31')) OVER(PARTITION BY WindowId, ReverseWindowId), '9999-12-31') As EndDate,
            Amount,
            row_number() OVER (PARTITION BY  ReverseWindowId, WindowId ORDER BY EndDate desc) as Rev_rank
    
    FROM CTE 


    ORDER BY StartDate)

    SELECT 

    ID,
    Stage,
    StartDate,
    EndDate,
    Amount

    FROM CTE_2 

    Where Rev_rank = 1

它可以工作,但是很丑。仍然有兴趣看看是否有人有更好的主意

您可以使用数组技巧来获取每个岛上的最后一个值。我觉得你的逻辑相当繁琐。我不确定您为什么使用 ROW_NUMBER() 四次,为什么要检查 CreatedDateNULL 值,也不知道为什么使用 SELECT DISTINCT 而不是 GROUP BY.

无论如何,这应该可以满足您的要求:

WITH CTE AS (
      SELECT t.*,
             ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CreatedDate) as seqnum,
             ROW_NUMBER() OVER (PARTITION BY ID, Stage ORDER BY CreatedDate) as seqnum_2
      FROM `BQ_TABLE` t
     )
SELECT ID, Stage, MIN(CreatedDate), MAX(CreatedDate),
       ARRAY_AGG(Amount ORDR BY CreatedDate DESC LIMIT 1)[SAFE_ORDINAL(1)]
FROM cte
GROUP BY ID, Stage, (seqnum - seqnum_2);

考虑以下方法

select id, stage, min(CreatedDate) CreatedDate, max(EndDate) EndDate, 
  array_agg(amount order by EndDate desc limit 1)[offset(0)] amount
from (
  select *, countif(new_stage) over(partition by id order by CreatedDate, EndDate) stage_group
  from (
    select *, 
      ifnull(stage != lag(stage) over(partition by id order by CreatedDate, EndDate), true) new_stage
    from `BQ_Table`    
  )
)
group by id, stage, stage_group             

如果应用于您问题中的示例数据 - 输出为