按日期计算的值差异
Count of difference in values by date
我有一个包含两列的数据集[日期,cust_id]。
date cust_id
2019-12-08 123
2019-12-08 321
2019-12-09 123
2019-12-09 456
我的客户流失率很高,我正在尝试通过计算新的 cust_id 的数量来创建另外两个列 [new_cust、left_cust]分别离开了day
在我有两个 table 的情况下,我通过查询没有问题:
新客户数
SELECT DISTINCT cust_id
FROM 2019-12-09
WHERE cust_id NOT IN (SELECT DISTINCT cust_id FROM 2019-12-08)
流失的客户数
SELECT DISTINCT cust_id
FROM 2019-12-08
WHERE cust_id NOT IN (SELECT DISTINCT cust_id FROM 2019-12-09)
我不确定如何查询单个 table 并按日期比较这些值。获得正确结果的最佳方法是什么?我正在使用 AWS Athena。
预期结果:
date new_cust cust_left
2019-12-08 2 0
2019-12-09 1 1
解释:假设 2019-12-08 是第一个日期,我有 2 个新客户和 0 个流失客户。 2019-12-09,我获得了1个新客户“456”,但有1个客户“321”已经流失。我将不得不将其应用于更长的日期范围和 cust_id.
嗯。我想你想要:
select date,
sum(case when prev_date is null then 1 else 0 end) as new_cust,
sum(case when next_date = date + interval '1' day then 0 else 1 end) as left_cust
from (select t.*,
lag(date) over (partition by cust_id order by date) as prev_date,
lead(date) over (partition by cust_id order by date) as next_date
from t
) t
group by date;
我有一个包含两列的数据集[日期,cust_id]。
date cust_id
2019-12-08 123
2019-12-08 321
2019-12-09 123
2019-12-09 456
我的客户流失率很高,我正在尝试通过计算新的 cust_id 的数量来创建另外两个列 [new_cust、left_cust]分别离开了day
在我有两个 table 的情况下,我通过查询没有问题:
新客户数
SELECT DISTINCT cust_id
FROM 2019-12-09
WHERE cust_id NOT IN (SELECT DISTINCT cust_id FROM 2019-12-08)
流失的客户数
SELECT DISTINCT cust_id
FROM 2019-12-08
WHERE cust_id NOT IN (SELECT DISTINCT cust_id FROM 2019-12-09)
我不确定如何查询单个 table 并按日期比较这些值。获得正确结果的最佳方法是什么?我正在使用 AWS Athena。
预期结果:
date new_cust cust_left
2019-12-08 2 0
2019-12-09 1 1
解释:假设 2019-12-08 是第一个日期,我有 2 个新客户和 0 个流失客户。 2019-12-09,我获得了1个新客户“456”,但有1个客户“321”已经流失。我将不得不将其应用于更长的日期范围和 cust_id.
嗯。我想你想要:
select date,
sum(case when prev_date is null then 1 else 0 end) as new_cust,
sum(case when next_date = date + interval '1' day then 0 else 1 end) as left_cust
from (select t.*,
lag(date) over (partition by cust_id order by date) as prev_date,
lead(date) over (partition by cust_id order by date) as next_date
from t
) t
group by date;