条件和的左连接横向
Left join lateral for conditional sums
我有一个包含客户、产品和类别的购买数据集。
customer product category sales_value
A aerosol air_care 10
B aerosol air_care 12
C aerosol air_care 7
A perfume air_care 8
A perfume air_care 2
D perfume air_care 11
C burger food 13
D fries food 6
C fries food 9
对于每种产品,我想知道至少购买过该产品一次的客户在该产品上花费的销售价值与在该产品类别上花费的销售价值之间的比率。
另一种说法:选取至少购买过 fries
一次的客户,并为所有这些客户计算 A) 在 fries
上花费的销售价值总和 B) 总和在 food
.
上花费的销售价值
中间 table 将具有以下形式:
product category sum_spent_on_product sum_spent_on_category ratio
by_people_buying_product
aerosol air_care 29 39 0.74
perfume air_care 21 31 0.68
burger food 13 22 0.59
fries food 15 28 0.53
示例:至少 aerosol
购买过一次的人在此产品上总共花费了 1800。总体而言,同一个人在 air_care
类别(aerosol
所属)上花费了 3600。因此,aerosol
的比率为 0.5。
我尝试使用 left join lateral
来解决这个问题,并为每个 product
计算给定的中间结果,但我无法理解如何包含条件 only for customers who bought this specific product
:
select
distinct (product_id)
, category
, c.sales_category
from transactions t
left join lateral (
select
sum(sales_value) as sales_category
from transactions
where category = t.category
group by category
) c on true
;
以上查询列出了每个产品在产品类别上的花费总和,但没有要求的产品购买者条件。
left join lateral
是正确的方法吗?在普通 SQL 中还有其他解决方案吗?
I want, for each product, the ratio between the sales value spent on this product, and the sales value spent on this product's category, by the customers who bought the product at least once.
如果我没理解错的话,你可以按人和类别汇总销售额以获得该类别的总计。在 Postgres 中,您可以保留一系列产品并将其用于匹配。所以,查询看起来像:
select p.product, p.category,
sum(p.sales_value) as product_only_sales,
sum(pp.sales_value) as comparable_sales
from purchases p join
(select customer, category, array_agg(distinct product) as products, sum(sales_value) as sales_value
from purchases p
group by customer, category
) pp
on p.customer = pp.customer and p.category = pp.category and p.product = any (pp.products)
group by p.product, p.category;
Here 是一个 db<>fiddle.
编辑:
数据允许产品日期重复。那把事情搞砸了。解决方案是为每个客户按产品进行预聚合:
select p.product, p.category, sum(p.sales_value) as product_only_sales, sum(pp.sales_value) as comparable_sales
from (select customer, category, product, sum(sales_value) as sales_value
from purchases p
group by customer, category, product
) p join
(select customer, category, array_agg(distinct product) as products, sum(sales_value) as sales_value
from purchases p
group by customer, category
) pp
on p.customer = pp.customer and p.category = pp.category and p.product = any (pp.products)
group by p.product, p.category
Here 是此示例的 db<>fiddle。
我会使用窗口函数来计算每个客户在每个类别中的总花费:
SELECT
customer, product, category, sales_value,
sum(sales_value) OVER (PARTITION BY customer, category) AS tot_cat
FROM transactions;
customer | product | category | sales_value | tot_cat
----------+---------+----------+-------------+---------
A | aerosol | air_care | 10.00 | 20.00
A | perfume | air_care | 8.00 | 20.00
A | perfume | air_care | 2.00 | 20.00
B | aerosol | air_care | 12.00 | 12.00
C | aerosol | air_care | 7.00 | 7.00
C | fries | food | 9.00 | 22.00
C | burger | food | 13.00 | 22.00
D | perfume | air_care | 11.00 | 11.00
D | fries | food | 6.00 | 6.00
那我们就来总结一下吧。当客户多次购买同一产品时,就会出现问题。在您的示例中,客户 A
购买了两次香水。为了克服这个问题,让我们同时按客户、产品和类别分组(并对 sales_value
列求和):
SELECT
customer, product, category, SUM(sales_value) AS sales_value,
SUM(SUM(sales_value)) OVER (PARTITION BY customer, category) AS tot_cat
FROM transactions
GROUP BY customer, product, category
customer | product | category | sales_value | tot_cat
----------+---------+----------+-------------+---------
A | aerosol | air_care | 10.00 | 20.00
A | perfume | air_care | 10.00 | 20.00 <-- this row summarizes rows 2 and 3 of previous result
B | aerosol | air_care | 12.00 | 12.00
C | aerosol | air_care | 7.00 | 7.00
C | burger | food | 13.00 | 22.00
C | fries | food | 9.00 | 22.00
D | perfume | air_care | 11.00 | 11.00
D | fries | food | 6.00 | 6.00
现在我们只需对 sales_value 和 tot_cat 求和即可得到中间结果 table。我使用一个常见的 table 表达式来获取名称 t
:
下的先前结果
WITH t AS (
SELECT
customer, product, category, SUM(sales_value) AS sales_value,
SUM(SUM(sales_value)) OVER (PARTITION BY customer, category) AS tot_cat
FROM transactions
GROUP BY customer, product, category
)
SELECT
product, category,
sum(sales_value) AS sales_value, sum(tot_cat) AS tot_cat,
sum(sales_value) / sum(tot_cat) AS ratio
FROM t
GROUP BY product, category;
product | category | sales_value | tot_cat | ratio
---------+----------+-------------+---------+------------------------
aerosol | air_care | 29.00 | 39.00 | 0.74358974358974358974
fries | food | 15.00 | 28.00 | 0.53571428571428571429
burger | food | 13.00 | 22.00 | 0.59090909090909090909
perfume | air_care | 21.00 | 31.00 | 0.67741935483870967742
我有一个包含客户、产品和类别的购买数据集。
customer product category sales_value
A aerosol air_care 10
B aerosol air_care 12
C aerosol air_care 7
A perfume air_care 8
A perfume air_care 2
D perfume air_care 11
C burger food 13
D fries food 6
C fries food 9
对于每种产品,我想知道至少购买过该产品一次的客户在该产品上花费的销售价值与在该产品类别上花费的销售价值之间的比率。
另一种说法:选取至少购买过 fries
一次的客户,并为所有这些客户计算 A) 在 fries
上花费的销售价值总和 B) 总和在 food
.
中间 table 将具有以下形式:
product category sum_spent_on_product sum_spent_on_category ratio
by_people_buying_product
aerosol air_care 29 39 0.74
perfume air_care 21 31 0.68
burger food 13 22 0.59
fries food 15 28 0.53
示例:至少 aerosol
购买过一次的人在此产品上总共花费了 1800。总体而言,同一个人在 air_care
类别(aerosol
所属)上花费了 3600。因此,aerosol
的比率为 0.5。
我尝试使用 left join lateral
来解决这个问题,并为每个 product
计算给定的中间结果,但我无法理解如何包含条件 only for customers who bought this specific product
:
select
distinct (product_id)
, category
, c.sales_category
from transactions t
left join lateral (
select
sum(sales_value) as sales_category
from transactions
where category = t.category
group by category
) c on true
;
以上查询列出了每个产品在产品类别上的花费总和,但没有要求的产品购买者条件。
left join lateral
是正确的方法吗?在普通 SQL 中还有其他解决方案吗?
I want, for each product, the ratio between the sales value spent on this product, and the sales value spent on this product's category, by the customers who bought the product at least once.
如果我没理解错的话,你可以按人和类别汇总销售额以获得该类别的总计。在 Postgres 中,您可以保留一系列产品并将其用于匹配。所以,查询看起来像:
select p.product, p.category,
sum(p.sales_value) as product_only_sales,
sum(pp.sales_value) as comparable_sales
from purchases p join
(select customer, category, array_agg(distinct product) as products, sum(sales_value) as sales_value
from purchases p
group by customer, category
) pp
on p.customer = pp.customer and p.category = pp.category and p.product = any (pp.products)
group by p.product, p.category;
Here 是一个 db<>fiddle.
编辑:
数据允许产品日期重复。那把事情搞砸了。解决方案是为每个客户按产品进行预聚合:
select p.product, p.category, sum(p.sales_value) as product_only_sales, sum(pp.sales_value) as comparable_sales
from (select customer, category, product, sum(sales_value) as sales_value
from purchases p
group by customer, category, product
) p join
(select customer, category, array_agg(distinct product) as products, sum(sales_value) as sales_value
from purchases p
group by customer, category
) pp
on p.customer = pp.customer and p.category = pp.category and p.product = any (pp.products)
group by p.product, p.category
Here 是此示例的 db<>fiddle。
我会使用窗口函数来计算每个客户在每个类别中的总花费:
SELECT
customer, product, category, sales_value,
sum(sales_value) OVER (PARTITION BY customer, category) AS tot_cat
FROM transactions;
customer | product | category | sales_value | tot_cat
----------+---------+----------+-------------+---------
A | aerosol | air_care | 10.00 | 20.00
A | perfume | air_care | 8.00 | 20.00
A | perfume | air_care | 2.00 | 20.00
B | aerosol | air_care | 12.00 | 12.00
C | aerosol | air_care | 7.00 | 7.00
C | fries | food | 9.00 | 22.00
C | burger | food | 13.00 | 22.00
D | perfume | air_care | 11.00 | 11.00
D | fries | food | 6.00 | 6.00
那我们就来总结一下吧。当客户多次购买同一产品时,就会出现问题。在您的示例中,客户 A
购买了两次香水。为了克服这个问题,让我们同时按客户、产品和类别分组(并对 sales_value
列求和):
SELECT
customer, product, category, SUM(sales_value) AS sales_value,
SUM(SUM(sales_value)) OVER (PARTITION BY customer, category) AS tot_cat
FROM transactions
GROUP BY customer, product, category
customer | product | category | sales_value | tot_cat
----------+---------+----------+-------------+---------
A | aerosol | air_care | 10.00 | 20.00
A | perfume | air_care | 10.00 | 20.00 <-- this row summarizes rows 2 and 3 of previous result
B | aerosol | air_care | 12.00 | 12.00
C | aerosol | air_care | 7.00 | 7.00
C | burger | food | 13.00 | 22.00
C | fries | food | 9.00 | 22.00
D | perfume | air_care | 11.00 | 11.00
D | fries | food | 6.00 | 6.00
现在我们只需对 sales_value 和 tot_cat 求和即可得到中间结果 table。我使用一个常见的 table 表达式来获取名称 t
:
WITH t AS (
SELECT
customer, product, category, SUM(sales_value) AS sales_value,
SUM(SUM(sales_value)) OVER (PARTITION BY customer, category) AS tot_cat
FROM transactions
GROUP BY customer, product, category
)
SELECT
product, category,
sum(sales_value) AS sales_value, sum(tot_cat) AS tot_cat,
sum(sales_value) / sum(tot_cat) AS ratio
FROM t
GROUP BY product, category;
product | category | sales_value | tot_cat | ratio
---------+----------+-------------+---------+------------------------
aerosol | air_care | 29.00 | 39.00 | 0.74358974358974358974
fries | food | 15.00 | 28.00 | 0.53571428571428571429
burger | food | 13.00 | 22.00 | 0.59090909090909090909
perfume | air_care | 21.00 | 31.00 | 0.67741935483870967742