如何在大查询中找到组之间的重叠 SQL

How to find overlap between groups in Big query SQL

我有以下数据集:

| user_id | Campaingid| <br>
| 1       | campaign1|  <br>
| 1       | campaign2| <br>
| 2       | campaign1| <br>
| 1       | campaign3| <br>
| 3       | campaign5| <br>
| 3       | campaign2|<br>
| 3       | campaign3|<br>
| 4       | campaign6| <br>
| 5       | campiagn5| <br>

我正在尝试找出活动中 user_id 的重叠部分,换句话说,活动 1 中有多少人也在活动 2 中:

我可以通过在活动 ID 上使用组来找到每个活动中的不同用户,但我需要帮助解决不同活动之间的重叠问题:我试图实现的结果可以用下面的矩阵来证明:

Campaign ID| 1 | 2 | 3 | 4| 5| 6| <br>
1 <br>
2 <br>
3 <br>
4 <br>

对角线只给出 campaign1 中的人员,campaign1-campaign2 是 campaign1 和 campaign2 中的人员。

有没有办法在 SQL (Bigquery) 中做到这一点。 谢谢

这更简单地生成为三列:

  • campaignid1
  • campaignid2
  • 用户数

为此,您可以使用自连接和聚合:

select d1.campaignid, d2.campaignid, count(*)
from dataset d1 join
     dataset d2
     on d1.userid = d2.userid
group by d1.campaignid, d2.campaignid;

您可以调整这些结果,但这需要了解活动:

select d1.campaignid,
       countif(d2.campaignid = 1) as campaign_1,
       countif(d2.campaignid = 2) as campaign_2,
       countif(d2.campaignid = 3) as campaign_3,
       countif(d2.campaignid = 4) as campaign_4,
       countif(d2.campaignid = 5) as campaign_5
from dataset d1 join
     dataset d2
     on d1.userid = d2.userid
group by d1.campaignid;