Amazon Redshift SQL 计数 Windows 函数
Amazon Redshift SQL Count Windows Function
我希望根据客户的忠诚度等级滚动计算客户数。每个等级都基于 LTV(0-124.99、125-198.99、199-749.99 和 750+)。这就是我现在所拥有的,它只是为每个日期返回 0,多年来散布着一些 1。任何人都可以帮助 windows 功能吗?
SELECT TRUNC(CD.date) AS "day", tier,
COUNT(customer_email) OVER(PARTITION BY TRUNC(CD.date), tier ORDER BY
TRUNC(CD.date), tier
ASC ROWS UNBOUNDED PRECEDING) AS "customers"
FROM {{ @public_fact_criquet_loyalty_master AS LM}}
RIGHT JOIN public.dim_calendar_dates CD ON TRUNC(CD.date) = TRUNC(LM.timestamp)
WHERE TRUNC(CD.date) BETWEEN '2011-11-09' AND CURRENT_DATE
GROUP BY TRUNC(CD.date), tier, customer_email
ORDER BY TRUNC(CD.date), tier ASC
您已按 TRUNC(CD.date) 进行分区,这会以您不希望的方式打断计数组。你在找这个吗?
COUNT(customer_email) OVER(PARTITION BY tier ORDER BY day ASC
ROWS UNBOUNDED PRECEDING) AS "customers"
PS。您可以在 window 函数中使用“day”,因为它已在第一个结果列中定义为“TRUNC(CD.date)”。
问题已更改为关于如何准备数据进行分析的问题,因此我添加了此代码以创建允许进一步查询的基础数据。
drop table if exists test;
create table test (dt date, customer_email varchar(64), tier varchar(32));
insert into test values
('2/3/2021', 'xxxxxxxx@yyyyyy.com', '2almost_vip'),
('3/12/2021', 'xxxxxxxx@yyyyyy.com', '4the_players_club'),
('4/27/2021', 'xxxxxxxx@yyyyyy.com', '5the_players_club'),
('8/6/2021', 'xxxxxxxx@yyyyyy.com', '6the_players_club'),
('11/22/2021', 'xxxxxxxx@yyyyyy.com', '7the_players_club'),
('12/16/2021', 'xxxxxxxx@yyyyyy.com', '8the_players_club'),
('1/3/2021', 'abc@qrs.com', '2almost_vip'),
('2/12/2021', 'abc@qrs.com', '4the_players_club'),
('3/27/2021', 'abc@qrs.com', '5the_players_club'),
('7/6/2021', 'abc@qrs.com', '6the_players_club'),
('10/22/2021', 'abc@qrs.com', '7the_players_club'),
('11/16/2021', 'abc@qrs.com', '8the_players_club');
with recursive dates(d) as (
select '2021-01-01'::date as d
union all
select d + 1 as d
from dates
where d < '2021-12-31') -- This CTE creates dates from 2021-01-01 to 2021-12-31)
select dt, customer_email,
LAG(min(tier), 1) IGNORE NULLS OVER (PARTITION BY customer_email ORDER BY dt ASC) AS tier -- Find the previous non-NULL tier - remember that group by runs before window
from (
select dt, customer_email, tier as tier -- combine your input data with ...
from test
union all
select d as dt, customer_email, null as tier -- a list of all dates per email
from dates
cross join (select DISTINCT customer_email from test) -- combine all dates with all emails
)
group by customer_email, dt
order by customer_email, dt;
此查询的输出(基于我输入的虚拟数据)是所有电子邮件的所有日期的列表,并填写了它们的层级。我不清楚你打算做什么后续步骤,所以我会把它留在这里。由于这个过程产生的数据比开始时多得多,所以它不会非常高效,而且很可能最终结果不需要所有数据。如果可以消除这种数据膨胀,处理速度应该会提高。
SELECT TRUNC(CD.date) AS "day", tier,
COUNT(customer_email) OVER(PARTITION BY TRUNC(CD.date), tier ORDER BY
TRUNC(CD.date), tier
ASC ROWS UNBOUNDED PRECEDING) AS "customers"
FROM {{ @public_fact_criquet_loyalty_master AS LM}}
RIGHT JOIN public.dim_calendar_dates CD ON TRUNC(CD.date) = TRUNC(LM.timestamp)
WHERE TRUNC(CD.date) BETWEEN '2011-11-09' AND CURRENT_DATE
GROUP BY TRUNC(CD.date), tier, customer_email
ORDER BY TRUNC(CD.date), tier ASC
您已按 TRUNC(CD.date) 进行分区,这会以您不希望的方式打断计数组。你在找这个吗?
COUNT(customer_email) OVER(PARTITION BY tier ORDER BY day ASC
ROWS UNBOUNDED PRECEDING) AS "customers"
PS。您可以在 window 函数中使用“day”,因为它已在第一个结果列中定义为“TRUNC(CD.date)”。
问题已更改为关于如何准备数据进行分析的问题,因此我添加了此代码以创建允许进一步查询的基础数据。
drop table if exists test;
create table test (dt date, customer_email varchar(64), tier varchar(32));
insert into test values
('2/3/2021', 'xxxxxxxx@yyyyyy.com', '2almost_vip'),
('3/12/2021', 'xxxxxxxx@yyyyyy.com', '4the_players_club'),
('4/27/2021', 'xxxxxxxx@yyyyyy.com', '5the_players_club'),
('8/6/2021', 'xxxxxxxx@yyyyyy.com', '6the_players_club'),
('11/22/2021', 'xxxxxxxx@yyyyyy.com', '7the_players_club'),
('12/16/2021', 'xxxxxxxx@yyyyyy.com', '8the_players_club'),
('1/3/2021', 'abc@qrs.com', '2almost_vip'),
('2/12/2021', 'abc@qrs.com', '4the_players_club'),
('3/27/2021', 'abc@qrs.com', '5the_players_club'),
('7/6/2021', 'abc@qrs.com', '6the_players_club'),
('10/22/2021', 'abc@qrs.com', '7the_players_club'),
('11/16/2021', 'abc@qrs.com', '8the_players_club');
with recursive dates(d) as (
select '2021-01-01'::date as d
union all
select d + 1 as d
from dates
where d < '2021-12-31') -- This CTE creates dates from 2021-01-01 to 2021-12-31)
select dt, customer_email,
LAG(min(tier), 1) IGNORE NULLS OVER (PARTITION BY customer_email ORDER BY dt ASC) AS tier -- Find the previous non-NULL tier - remember that group by runs before window
from (
select dt, customer_email, tier as tier -- combine your input data with ...
from test
union all
select d as dt, customer_email, null as tier -- a list of all dates per email
from dates
cross join (select DISTINCT customer_email from test) -- combine all dates with all emails
)
group by customer_email, dt
order by customer_email, dt;
此查询的输出(基于我输入的虚拟数据)是所有电子邮件的所有日期的列表,并填写了它们的层级。我不清楚你打算做什么后续步骤,所以我会把它留在这里。由于这个过程产生的数据比开始时多得多,所以它不会非常高效,而且很可能最终结果不需要所有数据。如果可以消除这种数据膨胀,处理速度应该会提高。