Amazon Redshift SQL 计数 Windows 函数

Amazon Redshift SQL Count Windows Function

我希望根据客户的忠诚度等级滚动计算客户数。每个等级都基于 LTV(0-124.99、125-198.99、199-749.99 和 750+)。这就是我现在所拥有的,它只是为每个日期返回 0,多年来散布着一些 1。任何人都可以帮助 windows 功能吗?

    SELECT TRUNC(CD.date) AS "day", tier, 
    COUNT(customer_email) OVER(PARTITION BY TRUNC(CD.date), tier ORDER BY 
    TRUNC(CD.date), tier 
    ASC ROWS UNBOUNDED PRECEDING) AS "customers"
    FROM {{ @public_fact_criquet_loyalty_master AS LM}}
    RIGHT JOIN public.dim_calendar_dates CD ON TRUNC(CD.date) = TRUNC(LM.timestamp)
    WHERE TRUNC(CD.date) BETWEEN '2011-11-09' AND CURRENT_DATE
    GROUP BY TRUNC(CD.date), tier, customer_email
    ORDER BY TRUNC(CD.date), tier ASC

您已按 TRUNC(CD.date) 进行分区,这会以您不希望的方式打断计数组。你在找这个吗?

COUNT(customer_email) OVER(PARTITION BY tier ORDER BY day ASC 
    ROWS UNBOUNDED PRECEDING) AS "customers"

PS。您可以在 window 函数中使用“day”,因为它已在第一个结果列中定义为“TRUNC(CD.date)”。

问题已更改为关于如何准备数据进行分析的问题,因此我添加了此代码以创建允许进一步查询的基础数据。

drop table if exists test;
create table test (dt date, customer_email varchar(64), tier varchar(32));

insert into test values
('2/3/2021', 'xxxxxxxx@yyyyyy.com', '2almost_vip'),
('3/12/2021', 'xxxxxxxx@yyyyyy.com', '4the_players_club'),
('4/27/2021', 'xxxxxxxx@yyyyyy.com', '5the_players_club'),
('8/6/2021', 'xxxxxxxx@yyyyyy.com', '6the_players_club'),
('11/22/2021', 'xxxxxxxx@yyyyyy.com', '7the_players_club'),
('12/16/2021', 'xxxxxxxx@yyyyyy.com', '8the_players_club'),
('1/3/2021', 'abc@qrs.com', '2almost_vip'),
('2/12/2021', 'abc@qrs.com', '4the_players_club'),
('3/27/2021', 'abc@qrs.com', '5the_players_club'),
('7/6/2021', 'abc@qrs.com', '6the_players_club'),
('10/22/2021', 'abc@qrs.com', '7the_players_club'),
('11/16/2021', 'abc@qrs.com', '8the_players_club');

with recursive dates(d) as (
select '2021-01-01'::date as d
union all
select d + 1 as d
from dates 
where d < '2021-12-31')  -- This CTE creates dates from 2021-01-01 to 2021-12-31)
select dt, customer_email, 
    LAG(min(tier), 1) IGNORE NULLS OVER (PARTITION BY customer_email ORDER BY dt ASC) AS tier -- Find the previous non-NULL tier - remember that group by runs before window
from (
    select dt, customer_email, tier as tier  -- combine your input data with ...
    from test
    union all
    select d as dt, customer_email, null as tier -- a list of all dates per email
    from dates 
    cross join (select DISTINCT customer_email from test)  -- combine all dates with all emails
    )
group by customer_email, dt
order by customer_email, dt;

此查询的输出(基于我输入的虚拟数据)是所有电子邮件的所有日期的列表,并填写了它们的层级。我不清楚你打算做什么后续步骤,所以我会把它留在这里。由于这个过程产生的数据比开始时多得多,所以它不会非常高效,而且很可能最终结果不需要所有数据。如果可以消除这种数据膨胀,处理速度应该会提高。