不想在过滤聚合中重复计算
Don't want to double count in Filtered Aggregation
示例数据:
shopper_id
last_purchase_timestamp
active_p30
active_p60
active_over_p90
1
2022-03-02 1:20:00
TRUE
TRUE
TRUE
2
2022-03-01 1:30:00
TRUE
TRUE
TRUE
3
2022-02-28 1:24:03
TRUE
TRUE
TRUE
4
2022-02-02 21:22:26
FALSE
TRUE
TRUE
我想统计购物者在过去 30 天(从 3 月 5 日开始)、过去 60 天等期间是否活跃(如上次购买)
我的目标是找出有多少购物者在过去 30 天内购买了他们最后一件商品,有多少购物者在过去 60 天内购买了他们最后一件商品等等。但是我不想重复计算。
我尝试过的:
AS total_active_p30,
count(*) FILTER (where last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '60' day)
AS total_active_p60,
count(*) FILTER (where last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '90' day) AS
total_active_p90
结果:
total_active_p30
total_active_p60
total_active_p90
3
4
4
然而,这导致它重复计算。我怎样才能防止它重复计算?计数总数应为 4。
我理想的输出是:
total_active_p30
total_active_p60
total_active_p90
3
1
0
提前谢谢大家!我正在使用 Trino!
将上限和下限都添加到过滤器中,使它们不相交。沿着这条线:
-- sample data
WITH dataset (last_purchase_timestamp) AS (
VALUES (timestamp '2022-03-02 1:20:00'),
(timestamp '2022-03-01 1:30:00'),
(timestamp '2022-02-28 1:24:03'),
(timestamp '2022-02-02 21:22:26')
)
-- query
select count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '30' day) total_active_p30,
count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '60' day and last_purchase_timestamp < DATE '2022-03-05' - INTERVAL '30' day) total_active_p60,
count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '90' day and last_purchase_timestamp < DATE '2022-03-05' - INTERVAL '60' day) total_active_p90
from dataset
输出:
total_active_p30
total_active_p60
total_active_p90
3
1
0
您的查询逻辑条件不正确。因为产生这个 >= DATE 2022-03-05 - INTERVAL 90 day
条件的数据总是有产生这个 >= DATE 2022-03-05 - INTERVAL 60 day
条件的数据。为此,我们必须编写查询:
count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '30' day))
as total_active_p30,
count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '60' day)
and last_purchase_timestamp < ('2022-03-05'::date - INTERVAL '30' day))
as total_active_p60,
count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '90' day)
and last_purchase_timestamp < ('2022-03-05'::date - INTERVAL '60' day))
as total_active_p90
示例数据:
shopper_id | last_purchase_timestamp | active_p30 | active_p60 | active_over_p90 |
---|---|---|---|---|
1 | 2022-03-02 1:20:00 | TRUE | TRUE | TRUE |
2 | 2022-03-01 1:30:00 | TRUE | TRUE | TRUE |
3 | 2022-02-28 1:24:03 | TRUE | TRUE | TRUE |
4 | 2022-02-02 21:22:26 | FALSE | TRUE | TRUE |
我想统计购物者在过去 30 天(从 3 月 5 日开始)、过去 60 天等期间是否活跃(如上次购买)
我的目标是找出有多少购物者在过去 30 天内购买了他们最后一件商品,有多少购物者在过去 60 天内购买了他们最后一件商品等等。但是我不想重复计算。
我尝试过的:
AS total_active_p30,
count(*) FILTER (where last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '60' day)
AS total_active_p60,
count(*) FILTER (where last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '90' day) AS
total_active_p90
结果:
total_active_p30 | total_active_p60 | total_active_p90 | |
---|---|---|---|
3 | 4 | 4 |
然而,这导致它重复计算。我怎样才能防止它重复计算?计数总数应为 4。
我理想的输出是:
total_active_p30 | total_active_p60 | total_active_p90 | |
---|---|---|---|
3 | 1 | 0 |
提前谢谢大家!我正在使用 Trino!
将上限和下限都添加到过滤器中,使它们不相交。沿着这条线:
-- sample data
WITH dataset (last_purchase_timestamp) AS (
VALUES (timestamp '2022-03-02 1:20:00'),
(timestamp '2022-03-01 1:30:00'),
(timestamp '2022-02-28 1:24:03'),
(timestamp '2022-02-02 21:22:26')
)
-- query
select count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '30' day) total_active_p30,
count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '60' day and last_purchase_timestamp < DATE '2022-03-05' - INTERVAL '30' day) total_active_p60,
count_if(last_purchase_timestamp >= DATE '2022-03-05' - INTERVAL '90' day and last_purchase_timestamp < DATE '2022-03-05' - INTERVAL '60' day) total_active_p90
from dataset
输出:
total_active_p30 | total_active_p60 | total_active_p90 |
---|---|---|
3 | 1 | 0 |
您的查询逻辑条件不正确。因为产生这个 >= DATE 2022-03-05 - INTERVAL 90 day
条件的数据总是有产生这个 >= DATE 2022-03-05 - INTERVAL 60 day
条件的数据。为此,我们必须编写查询:
count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '30' day))
as total_active_p30,
count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '60' day)
and last_purchase_timestamp < ('2022-03-05'::date - INTERVAL '30' day))
as total_active_p60,
count(*) filter (where last_purchase_timestamp >= ('2022-03-05'::date - INTERVAL '90' day)
and last_purchase_timestamp < ('2022-03-05'::date - INTERVAL '60' day))
as total_active_p90