嵌套查询时对结果进行分组
Grouping results when nesting queries
我试图在我的数据库中获取符合某些条件的实体的 6 个月趋势,但问题是我需要嵌套几个层次来确定实体是否符合条件。
这些实体是 "members",他们可能有多个 "accounts",我需要确保他们的 none 帐户在我包含它们之前设置了某些标志。
如果我只想得到一个特定日期的计数(我们保留历史数据),我会这样做:
SELECT COUNT(sup.SSN)
FROM MemberSuppTable as sup
WHERE (
sup.ProcessDate = @PROCESSDATE
AND sup.MemberSuppID IN (
SELECT summ.MemberSuppID
FROM MemberSummaryTable as summ
WHERE (
summ.ProcessDate = @PROCESSDATE
AND summ.AccountNumber IN (
SELECT acct.AccountNumber
FROM AccountTable as acct
WHERE (
acct.ProcessDate = @PROCESSDATE
--other criteria for account exclusion go here.
)
)
)
)
)
MemberSuppTable
有成员的高级信息:
(ID, FirstAccountOpenDate, status, etc)
MemberSummaryTable
将帐户绑定到 MemberSuppTable
:
中的成员
(AccountNumber, MemberSuppID, ...)
现在,我正在尝试获取月末流程日期的计数,在单个查询中按流程日期分组。
所以,上面的查询会 return
ssn count
----------
1,000,000
我要:
process date | ssn count
------------------------
20160430 | 8,000,000
20160551 | 8,500,000
... | ...
20160331 | 1,000,000
到目前为止,我已经得出以下结论(请参阅下文了解它为何不起作用):
WITH valid_dates AS (
SELECT D.ProcessDate
FROM arcu.vwARCUProcessDates AS D
WHERE d.FullDate = D.MonthEndDate
AND d.ProcessDate >= @SDATE
)
SELECT sup.ProcessDate, COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable as sup
WHERE (
AND sup.ProcessDate IN (SELECT * FROM valid_dates)
AND sup.MemberSuppID IN (
SELECT summ.MemberSuppID
FROM MemberSummaryTable as summ
WHERE (
summ.ProcessDate IN (SELECT * FROM valid_dates)
AND summ.AccountNumber IN (
SELECT acct.AccountNumber
FROM AccountTable as acct
WHERE (
acct.ProcessDate IN (SELECT * FROM valid_dates)
...
)
)
)
)
)
GROUP BY (sup.ProcessDate)
不过,通过上述查询,我相信如果某个成员符合 valid_dates table 中任何处理日期的条件,那么他们将被包含在所有组中。
谁能帮帮我? (我是 SQL 的新手,所以如果我遗漏了一些简单的东西,请原谅我。)
IN 子句非常适合此类查询。比联接更具可读性,因为您清楚地显示了您 select 来自哪些 table 数据以及仅访问哪些 table 以检查记录是否存在。这结构良好,表明您已经对查询进行了一些思考。
但是,如果没有不必要的别名和括号,您的查询将变得更具可读性。
无论如何,你想使用你在子查询中找到的相同进程日期,我猜,所以相应地增强你的 IN 子句:
select processdate, count(distinct ssn)
from membersupptable
where (processdate, membersuppid) in
(
select processdate, membersuppid
from membersummarytable
where (processdate, accountnumber) in
(
select processdate, accountnumber
from accounttable
where processdate in
(
select processdate
from vwarcuprocessdates
where fulldate = monthenddate
and processdate >= @sdate
)
)
)
group by processdate;
首先,我将使用 INNER JOIN
而不是 WHERE .. IN
:
重写您的第一个查询
SELECT COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable as sup
INNER JOIN MemberSummaryTable AS summ
ON summ.MemberSuppID = sup.MemberSuppID
INNER JOIN AccountTable AS acct
ON acct.AccountNumber = summ.AccountNumber
WHERE sup.ProcessDate = @PROCESSDATE
AND summ.ProcessDate = @PROCESSDATE
AND acct.ProcessDate = @PROCESSDATE
-- other criteria for account exclusion go here.
这看起来更紧凑并且(恕我直言)更具可读性。
现在我将改变查询方式,@PROCESSDATE
只出现一次
SELECT COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable as sup
INNER JOIN MemberSummaryTable AS summ
ON summ.MemberSuppID = sup.MemberSuppID
INNER JOIN AccountTable AS acct
ON acct.AccountNumber = summ.AccountNumber
WHERE sup.ProcessDate = @PROCESSDATE
AND summ.ProcessDate = sup.ProcessDate
AND acct.ProcessDate = sup.ProcessDate
-- other criteria for account exclusion go here.
您可以将条件保留在 WHERE
子句中,但我更希望它们在 ON
子句中
SELECT COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable AS sup
INNER JOIN MemberSummaryTable AS summ
ON summ.MemberSuppID = sup.MemberSuppID
AND summ.ProcessDate = sup.ProcessDate
INNER JOIN AccountTable AS acct
ON acct.AccountNumber = summ.AccountNumber
AND acct.ProcessDate = sup.ProcessDate
WHERE sup.ProcessDate = @PROCESSDATE
-- other criteria for account exclusion go here.
现在很容易获得每个 ProcessDate
的 COUNT
SELECT sup.ProcessDate, COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable as sup
INNER JOIN MemberSummaryTable AS summ
ON summ.MemberSuppID = sup.MemberSuppID
AND summ.ProcessDate = sup.ProcessDate
INNER JOIN AccountTable AS acct
ON acct.AccountNumber = summ.AccountNumber
AND acct.ProcessDate = sup.ProcessDate
-- WHERE criteria for account exclusion go here.
GROUP BY sup.ProcessDate
要同时按 "valid_dates" 过滤,它只是一个额外的 JOIN
和一些 WHERE
条件
SELECT sup.ProcessDate, COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable as sup
INNER JOIN MemberSummaryTable AS summ
ON summ.MemberSuppID = sup.MemberSuppID
AND summ.ProcessDate = sup.ProcessDate
INNER JOIN AccountTable AS acct
ON acct.AccountNumber = summ.AccountNumber
AND acct.ProcessDate = sup.ProcessDate
INNER JOIN arcu.vwARCUProcessDates AS d
ON d.ProcessDate = sup.ProcessDate
WHERE d.FullDate = d.MonthEndDate
AND d.ProcessDate >= @SDATE
-- AND criteria for account exclusion go here.
GROUP BY sup.ProcessDate
为了获得更好的性能,GROUP BY d.ProcessDate
可能更好,但不要忘记也调整 SELECT
部分。
编辑:
如评论中所述,如果每个 SSN 都必须计算一次,则必须使用 DISTINCT
关键字。所以我编辑了解决方案。
还必须注意,即使使用 DISTINCT
,第一个查询也不总是等同于原始查询。如果 sup.SSN
不是唯一的,查询可能会 return 不同的结果。
我试图在我的数据库中获取符合某些条件的实体的 6 个月趋势,但问题是我需要嵌套几个层次来确定实体是否符合条件。
这些实体是 "members",他们可能有多个 "accounts",我需要确保他们的 none 帐户在我包含它们之前设置了某些标志。
如果我只想得到一个特定日期的计数(我们保留历史数据),我会这样做:
SELECT COUNT(sup.SSN)
FROM MemberSuppTable as sup
WHERE (
sup.ProcessDate = @PROCESSDATE
AND sup.MemberSuppID IN (
SELECT summ.MemberSuppID
FROM MemberSummaryTable as summ
WHERE (
summ.ProcessDate = @PROCESSDATE
AND summ.AccountNumber IN (
SELECT acct.AccountNumber
FROM AccountTable as acct
WHERE (
acct.ProcessDate = @PROCESSDATE
--other criteria for account exclusion go here.
)
)
)
)
)
MemberSuppTable
有成员的高级信息:
(ID, FirstAccountOpenDate, status, etc)
MemberSummaryTable
将帐户绑定到 MemberSuppTable
:
(AccountNumber, MemberSuppID, ...)
现在,我正在尝试获取月末流程日期的计数,在单个查询中按流程日期分组。
所以,上面的查询会 return
ssn count
----------
1,000,000
我要:
process date | ssn count
------------------------
20160430 | 8,000,000
20160551 | 8,500,000
... | ...
20160331 | 1,000,000
到目前为止,我已经得出以下结论(请参阅下文了解它为何不起作用):
WITH valid_dates AS (
SELECT D.ProcessDate
FROM arcu.vwARCUProcessDates AS D
WHERE d.FullDate = D.MonthEndDate
AND d.ProcessDate >= @SDATE
)
SELECT sup.ProcessDate, COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable as sup
WHERE (
AND sup.ProcessDate IN (SELECT * FROM valid_dates)
AND sup.MemberSuppID IN (
SELECT summ.MemberSuppID
FROM MemberSummaryTable as summ
WHERE (
summ.ProcessDate IN (SELECT * FROM valid_dates)
AND summ.AccountNumber IN (
SELECT acct.AccountNumber
FROM AccountTable as acct
WHERE (
acct.ProcessDate IN (SELECT * FROM valid_dates)
...
)
)
)
)
)
GROUP BY (sup.ProcessDate)
不过,通过上述查询,我相信如果某个成员符合 valid_dates table 中任何处理日期的条件,那么他们将被包含在所有组中。
谁能帮帮我? (我是 SQL 的新手,所以如果我遗漏了一些简单的东西,请原谅我。)
IN 子句非常适合此类查询。比联接更具可读性,因为您清楚地显示了您 select 来自哪些 table 数据以及仅访问哪些 table 以检查记录是否存在。这结构良好,表明您已经对查询进行了一些思考。
但是,如果没有不必要的别名和括号,您的查询将变得更具可读性。
无论如何,你想使用你在子查询中找到的相同进程日期,我猜,所以相应地增强你的 IN 子句:
select processdate, count(distinct ssn)
from membersupptable
where (processdate, membersuppid) in
(
select processdate, membersuppid
from membersummarytable
where (processdate, accountnumber) in
(
select processdate, accountnumber
from accounttable
where processdate in
(
select processdate
from vwarcuprocessdates
where fulldate = monthenddate
and processdate >= @sdate
)
)
)
group by processdate;
首先,我将使用 INNER JOIN
而不是 WHERE .. IN
:
SELECT COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable as sup
INNER JOIN MemberSummaryTable AS summ
ON summ.MemberSuppID = sup.MemberSuppID
INNER JOIN AccountTable AS acct
ON acct.AccountNumber = summ.AccountNumber
WHERE sup.ProcessDate = @PROCESSDATE
AND summ.ProcessDate = @PROCESSDATE
AND acct.ProcessDate = @PROCESSDATE
-- other criteria for account exclusion go here.
这看起来更紧凑并且(恕我直言)更具可读性。
现在我将改变查询方式,@PROCESSDATE
只出现一次
SELECT COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable as sup
INNER JOIN MemberSummaryTable AS summ
ON summ.MemberSuppID = sup.MemberSuppID
INNER JOIN AccountTable AS acct
ON acct.AccountNumber = summ.AccountNumber
WHERE sup.ProcessDate = @PROCESSDATE
AND summ.ProcessDate = sup.ProcessDate
AND acct.ProcessDate = sup.ProcessDate
-- other criteria for account exclusion go here.
您可以将条件保留在 WHERE
子句中,但我更希望它们在 ON
子句中
SELECT COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable AS sup
INNER JOIN MemberSummaryTable AS summ
ON summ.MemberSuppID = sup.MemberSuppID
AND summ.ProcessDate = sup.ProcessDate
INNER JOIN AccountTable AS acct
ON acct.AccountNumber = summ.AccountNumber
AND acct.ProcessDate = sup.ProcessDate
WHERE sup.ProcessDate = @PROCESSDATE
-- other criteria for account exclusion go here.
现在很容易获得每个 ProcessDate
的COUNT
SELECT sup.ProcessDate, COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable as sup
INNER JOIN MemberSummaryTable AS summ
ON summ.MemberSuppID = sup.MemberSuppID
AND summ.ProcessDate = sup.ProcessDate
INNER JOIN AccountTable AS acct
ON acct.AccountNumber = summ.AccountNumber
AND acct.ProcessDate = sup.ProcessDate
-- WHERE criteria for account exclusion go here.
GROUP BY sup.ProcessDate
要同时按 "valid_dates" 过滤,它只是一个额外的 JOIN
和一些 WHERE
条件
SELECT sup.ProcessDate, COUNT(DISTINCT sup.SSN)
FROM MemberSuppTable as sup
INNER JOIN MemberSummaryTable AS summ
ON summ.MemberSuppID = sup.MemberSuppID
AND summ.ProcessDate = sup.ProcessDate
INNER JOIN AccountTable AS acct
ON acct.AccountNumber = summ.AccountNumber
AND acct.ProcessDate = sup.ProcessDate
INNER JOIN arcu.vwARCUProcessDates AS d
ON d.ProcessDate = sup.ProcessDate
WHERE d.FullDate = d.MonthEndDate
AND d.ProcessDate >= @SDATE
-- AND criteria for account exclusion go here.
GROUP BY sup.ProcessDate
为了获得更好的性能,GROUP BY d.ProcessDate
可能更好,但不要忘记也调整 SELECT
部分。
编辑:
如评论中所述,如果每个 SSN 都必须计算一次,则必须使用 DISTINCT
关键字。所以我编辑了解决方案。
还必须注意,即使使用 DISTINCT
,第一个查询也不总是等同于原始查询。如果 sup.SSN
不是唯一的,查询可能会 return 不同的结果。