sql redshift：根据一列值加上每个组合的购买（事件）数量，创建 table 对

Question

我有一个包含两列的 table：产品和客户。我需要在产品之间创建所有配对，并添加第三列，其中包含购买该配对的两种产品的客户数量。

示例：

clients product
001 pants
001 shirt
001 pants
002 pants
002 shirt
002 shoes

我需要重新订购 tuplas 中的产品，并添加第三列，其中包含购买这两种产品的唯一客户的数量。所以前面的例子，结果是：

product1 product2 count
pants shirt 2
pants shoes 1
shirt shoes 1

我想避免重复的信息。例如，不需要一行 'shirt pants 2'。

有人知道怎么做吗？

谢谢！

Answer 1

嗯。 . .你有重复项，所以这可能会变得混乱。

简单的方法是 join 和 group by:

select t1.product, t2.product, count(distinct t1.client)
from t t1 join
     t t2
     on t1.product = t2.product 
group by t1.product, t2.product;

这个想法可能会非常昂贵，尤其是在有很多重复项的情况下。

一种替代方法是在执行 join:

之前使用 distinct

select pc.product, pc2.product, count(*)
from (select distinct product, client from t) pc
     (select distinct product, client from t) pc2
     on pc2.client = pc.client and pc2.product < pc.product;

sql redshift：根据一列值加上每个组合的购买（事件）数量，创建 table 对

sql redshift: Create table with pairs based on one column values plus the number of purchases (events) of each combination

sql

combinations

amazon-redshift