我需要连接 3 个表以避免重复并聚合来自其中 2 个表的数据

I need to join 3 tables avoiding duplicates and aggregating data from 2 of the tables

我正在处理一些电子邮件数据并有 3 个文件:已发送、打开(已发送的子集)和点击(已打开的子集)。基本上我想通过 SubID(唯一标识符)加入打开并单击发送文件。

此外,还有 3 个电子邮件部署 (JobID);我想计算一个 subID 通过 JobID 打开电子邮件的次数以及他们通过 JobID 点击了多少链接。下面是一个例子:

JobID    SubID       Opened      Clicked?  #ofClicks
63809    44775286    0           0          0
89993    44775286    0           0          0
191443   44775286    0           0          0

63809    44775288    3           0          0
89993    44775288    1           0          0
191443   44775288    2           0          0

63809    44775490    4           1          3
89993    44775490    1           0          0
191443   44775490    1           0          0

基本上,如果 SubID 在打开的文件中,他们就会打开电子邮件;如果 SubID 在点击文件中,他们就会点击它。此示例中的前 2 列来自发送文件(尽管所有 3 个文件都有这两列)。

我尝试使用查询来回答其中的一些问题(前 4 列),但它没有计算点击次数或打开得相当正确。它正在计算所有 jobID 的所有点击次数和打开次数,并对每项工作使用相同的总数,这不是我想要的。我确信有一种方法可以通过连接来做到这一点,但我仍然是 SQL 的新手并且正在苦苦挣扎。

WITH temptable AS
(
SELECT Staging_SendLog.SubID ,jobid,
( SELECT COUNT(0) FROM Staging_DailyOpens 
  WHERE SubscriberID = Staging_SendLog.SubID) AS opens,
( SELECT COUNT(0) FROM Staging_DailyClicks 
  WHERE SubscriberID = Staging_SendLog.SubID) AS clicks
FROM Staging_SendLog 
WHERE  JobID = 63809 OR JobID = 89993
)
SELECT subid, jobid, opens, clicks FROM temptable
GROUP BY  subID, JobID, opens, clicks
ORDER BY 1;

有人可以帮忙吗?我正在使用 Microsoft sql 服务器

您分组不当:

WITH clicks as (
  SELECT SubID, COUNT(*) AS Clicks 
  FROM Staging_DailyClicks 
  GROUP BY SubID
), opens as (
  SELECT SubID, COUNT(*) AS Opens
  FROM Staging_DailyOpens 
  GROUP BY SubID
)
SELECT Staging_SendLog.SubID ,jobid,
   Opens.Opens,
   Clicks.clicks
FROM Staging_SendLog 
JOIN Opens  on Opens.SubID = Staging_SendLog.SubID
JOIN Clicks on Clicks.SubID = Staging_SendLog.SubID
WHERE Staging_SendLog.JobID = 63809
   OR Staging_SendLog.JobID = 89993
ORDER BY 1;

如果您需要按 JobID 和 SubID 对计数进行分组,您可以尝试

SELECT 
    s.JobID,
    s.SubID,
    c.ClickCount,
    o.OpenCount
FROM
    Staging_SendLog s
    LEFT JOIN (SELECT 
                    JobId, 
                    SubscriberId, 
                    COUNT(*) AS ClickCount
                FROM 
                    Staging_DailyClicks 
                WHERE 
                    JobID IN (63809,89993)
                GROUP BY 
                    JobId,
                    SubscriberId 
            ) c ON c.JobID = s.JobID AND c.SubscriberId = s.SubID             
    LEFT JOIN (SELECT 
                    JobId, 
                    SubscriberId, 
                    COUNT(*) AS OpenCount
                FROM 
                    Staging_DailyOpens
                WHERE
                    JobID IN (63809,89993) 
                GROUP BY 
                    JobId,
                    SubscriberId 
            ) o ON o.JobID = s.JobID AND o.SubscriberId = s.SubID  
WHERE  
    s.JobID IN (63809, 89993) 

如果您只需要通过 SubID 就可以使用

SELECT 
    s.SubID,
    SUM(ISNULL(c.ClickCount,0)) AS ClickCount,
    SUM(ISNULL(o.OpenCount,0)) AS OpenCount
FROM
    Staging_SendLog s
    LEFT JOIN (SELECT 
                    JobId, 
                    SubscriberId, 
                    COUNT(*) AS ClickCount
                FROM 
                    Staging_DailyClicks 
                WHERE
                    JobID IN (63809,89993)
                GROUP BY 
                    JobId,
                    SubscriberId 
            ) c ON c.JobID = s.JobID AND c.SubscriberId = s.SubID             
    LEFT JOIN (SELECT 
                    JobId, 
                    SubscriberId, 
                    COUNT(*) AS OpenCount
                FROM 
                    Staging_DailyOpens
                WHERE
                    JobID IN (63809,89993)
                GROUP BY 
                    JobId,
                    SubscriberId 
            ) o ON o.JobID = s.JobID AND o.SubscriberId = s.SubID    
WHERE  
    s.JobID IN (63809,89993)   
GROUP BY 
    s.SubID

编辑

我将 JobID 过滤器添加到子查询中,以防这些是大表来过滤它们。应该有助于提高性能