嵌套查询时对结果进行分组

Grouping results when nesting queries

我试图在我的数据库中获取符合某些条件的实体的 6 个月趋势,但问题是我需要嵌套几个层次来确定实体是否符合条件。

这些实体是 "members",他们可能有多个 "accounts",我需要确保他们的 none 帐户在我包含它们之前设置了某些标志。

如果我只想得到一个特定日期的计数(我们保留历史数据),我会这样做:

SELECT COUNT(sup.SSN) 
FROM MemberSuppTable as sup 
WHERE  (
  sup.ProcessDate = @PROCESSDATE
  AND sup.MemberSuppID IN (
    SELECT summ.MemberSuppID 
    FROM MemberSummaryTable as summ
    WHERE  (
      summ.ProcessDate = @PROCESSDATE
      AND summ.AccountNumber IN (
        SELECT acct.AccountNumber 
        FROM AccountTable as acct
        WHERE ( 
          acct.ProcessDate = @PROCESSDATE
          --other criteria for account exclusion go here. 
        )
      )
    )
  )
)

MemberSuppTable 有成员的高级信息:

(ID, FirstAccountOpenDate, status, etc)

MemberSummaryTable 将帐户绑定到 MemberSuppTable:

中的成员
(AccountNumber, MemberSuppID, ...) 

现在,我正在尝试获取月末流程日期的计数,在单个查询中按流程日期分组。

所以,上面的查询会 return

ssn count
----------
1,000,000

我要:

process date | ssn count
------------------------
20160430     | 8,000,000
20160551     | 8,500,000
...          | ...
20160331     | 1,000,000

到目前为止,我已经得出以下结论(请参阅下文了解它为何不起作用):

WITH valid_dates AS (
  SELECT D.ProcessDate 
  FROM arcu.vwARCUProcessDates AS D 
  WHERE d.FullDate = D.MonthEndDate 
    AND d.ProcessDate >= @SDATE
)


SELECT sup.ProcessDate, COUNT(DISTINCT sup.SSN) 
FROM MemberSuppTable as sup 
WHERE (
  AND sup.ProcessDate IN (SELECT * FROM valid_dates)    
  AND sup.MemberSuppID IN (
    SELECT summ.MemberSuppID
    FROM MemberSummaryTable as summ
    WHERE  (
      summ.ProcessDate IN (SELECT * FROM valid_dates)
      AND summ.AccountNumber IN (
        SELECT acct.AccountNumber 
        FROM AccountTable as acct
        WHERE ( 
          acct.ProcessDate IN (SELECT * FROM valid_dates)
          ...
        )
      )
    )
  )
)
GROUP BY (sup.ProcessDate)

不过,通过上述查询,我​​相信如果某个成员符合 valid_dates table 中任何处理日期的条件,那么他们将被包含在所有组中。

谁能帮帮我? (我是 SQL 的新手,所以如果我遗漏了一些简单的东西,请原谅我。)

IN 子句非常适合此类查询。比联接更具可读性,因为您清楚地显示了您 select 来自哪些 table 数据以及仅访问哪些 table 以检查记录是否存在。这结构良好,表明您已经对查询进行了一些思考。

但是,如果没有不必要的别名和括号,您的查询将变得更具可读性。

无论如何,你想使用你在子查询中找到的相同进程日期,我猜,所以相应地增强你的 IN 子句:

select processdate, count(distinct ssn) 
from membersupptable 
where (processdate, membersuppid) in 
(
  select processdate, membersuppid
  from membersummarytable
  where (processdate, accountnumber) in
  (
    select processdate, accountnumber 
    from accounttable
    where processdate in 
    (
      select processdate 
      from vwarcuprocessdates
      where fulldate = monthenddate 
      and processdate >= @sdate
    )
  )
)
group by processdate;

首先,我将使用 INNER JOIN 而不是 WHERE .. IN:

重写您的第一个查询
SELECT COUNT(DISTINCT sup.SSN) 
FROM MemberSuppTable as sup 
INNER JOIN MemberSummaryTable AS summ
    ON summ.MemberSuppID = sup.MemberSuppID
INNER JOIN AccountTable AS acct
    ON acct.AccountNumber = summ.AccountNumber
WHERE sup.ProcessDate  = @PROCESSDATE
  AND summ.ProcessDate = @PROCESSDATE
  AND acct.ProcessDate = @PROCESSDATE
  -- other criteria for account exclusion go here.

这看起来更紧凑并且(恕我直言)更具可读性。

现在我将改变查询方式,@PROCESSDATE 只出现一次

SELECT COUNT(DISTINCT sup.SSN) 
FROM MemberSuppTable as sup 
INNER JOIN MemberSummaryTable AS summ
    ON summ.MemberSuppID = sup.MemberSuppID
INNER JOIN AccountTable AS acct
    ON acct.AccountNumber = summ.AccountNumber
WHERE sup.ProcessDate  = @PROCESSDATE
  AND summ.ProcessDate = sup.ProcessDate
  AND acct.ProcessDate = sup.ProcessDate
  -- other criteria for account exclusion go here.

您可以将条件保留在 WHERE 子句中,但我更希望它们在 ON 子句中

SELECT COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable AS sup
INNER JOIN MemberSummaryTable AS summ 
    ON  summ.MemberSuppID = sup.MemberSuppID
    AND summ.ProcessDate  = sup.ProcessDate
INNER JOIN AccountTable AS acct
    ON  acct.AccountNumber = summ.AccountNumber
    AND acct.ProcessDate = sup.ProcessDate
WHERE sup.ProcessDate = @PROCESSDATE
  -- other criteria for account exclusion go here.

现在很容易获得每个 ProcessDate

COUNT
SELECT sup.ProcessDate, COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable as sup 
INNER JOIN MemberSummaryTable AS summ
    ON  summ.MemberSuppID = sup.MemberSuppID
    AND summ.ProcessDate  = sup.ProcessDate
INNER JOIN AccountTable AS acct
    ON  acct.AccountNumber = summ.AccountNumber
    AND acct.ProcessDate   = sup.ProcessDate
-- WHERE criteria for account exclusion go here. 
GROUP BY sup.ProcessDate

要同时按 "valid_dates" 过滤,它只是一个额外的 JOIN 和一些 WHERE 条件

SELECT sup.ProcessDate, COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable as sup 
INNER JOIN MemberSummaryTable AS summ
    ON  summ.MemberSuppID = sup.MemberSuppID
    AND summ.ProcessDate  = sup.ProcessDate
INNER JOIN AccountTable AS acct
    ON  acct.AccountNumber = summ.AccountNumber
    AND acct.ProcessDate   = sup.ProcessDate
INNER JOIN arcu.vwARCUProcessDates AS d
    ON d.ProcessDate = sup.ProcessDate
WHERE d.FullDate = d.MonthEndDate 
  AND d.ProcessDate >= @SDATE
  -- AND criteria for account exclusion go here.
GROUP BY sup.ProcessDate

为了获得更好的性能,GROUP BY d.ProcessDate 可能更好,但不要忘记也调整 SELECT 部分。

编辑: 如评论中所述,如果每个 SSN 都必须计算一次,则必须使用 DISTINCT 关键字。所以我编辑了解决方案。

还必须注意,即使使用 DISTINCT,第一个查询也不总是等同于原始查询。如果 sup.SSN 不是唯一的,查询可能会 return 不同的结果。