SQL 服务器:在帐户的多个列中查找客户是否存在(以及他们拥有的帐户)table
SQL Server: finding if a client exists (and the account they own) in MULTIPLE columns of the accounts table
我正在尝试查找拥有(或共同拥有)帐户和帐号的客户列表。
我们的客户 table 包含我们的客户数据,用 clientId
标识(table 还包含人口统计信息和联系方式)。
我们的帐户 table 包含帐户信息,其中 clientId
可以存在于 1 或 5 列中(一个帐户最多可以有 5 个共同所有者)。
我的代码看起来像这样,但是速度慢得令人痛苦。这是正确的方法吗?还有其他更便宜的方法吗?
SELECT
c.*, aa.accountNo
FROM
client AS c, accounts AS aa
WHERE
EXISTS (SELECT 1
FROM accounts AS a
WHERE CAST(a.Account_Date AS Date) >= '2010-11-15'
AND CAST(a.Account_Date AS Date) <= '2017-04-24'
AND c.clientId IN (a.Owner1, a.Owner2, a.Owner3, a.Owner4, a.Owner5))
您的查询爆破了数据集,对账户进行了两次扫描。此外,我们不必 cast 列侧以适应日期范围,现在有一种有益的方法(它可能并不总是导致扫描 but it's still not great)。尝试:
SELECT c.*, a.accountNo
FROM dbo.accounts AS a
CROSS APPLY
(
VALUES(Owner1),(Owner2),(Owner3),(Owner4),(Owner5)
) AS ac(clientId)
INNER JOIN dbo.client AS c
ON c.clientId = ac.clientId
WHERE a.Account_Date >= '20101115'
AND a.Account_Date < '20170425';
您的主要问题是缺乏规范化。您不应该有五个 Owner
列。相反,您应该有一个单独的 table of AccountOwner
。那你就直接加入吧。
This is effectively what you get from @AaronBetrand's answer, except that that one cannot be indexed as it is virtual.
另请注意:
- 不需要在子查询中再次访问
accounts
。
- 始终使用显式连接语法,而不是隐式
,
语法。
- 永远不要为了过滤或加入而投射列。始终转换常量。在这种情况下,您甚至不需要投射它们。
SELECT
c.*,
a.accountNo
FROM Client AS c
JOIN AccountOwner AS ao ON ao.OwnerId = c.ClientId
JOIN Accounts AS a
ON a.AccountNo = ao.AccountNo
AND a.Account_Date >= '20101115'
AND a.Account_Date < '20170425';
要使此查询高效工作,您将需要以下索引
Account (Account_Date, AccountNo)
AccountOwner (AccountNo, OwnerId)
Client (ClientId) INCLUDE (OtherColumns)
另一组索引可能证明是更好的访问策略(您需要测试)
Account (AccountNo) INCLUDE (Account_Date)
AccountOwner (OwnerId, AccountNo)
Client (ClientId) INCLUDE (OtherColumns)
我正在尝试查找拥有(或共同拥有)帐户和帐号的客户列表。
我们的客户 table 包含我们的客户数据,用 clientId
标识(table 还包含人口统计信息和联系方式)。
我们的帐户 table 包含帐户信息,其中 clientId
可以存在于 1 或 5 列中(一个帐户最多可以有 5 个共同所有者)。
我的代码看起来像这样,但是速度慢得令人痛苦。这是正确的方法吗?还有其他更便宜的方法吗?
SELECT
c.*, aa.accountNo
FROM
client AS c, accounts AS aa
WHERE
EXISTS (SELECT 1
FROM accounts AS a
WHERE CAST(a.Account_Date AS Date) >= '2010-11-15'
AND CAST(a.Account_Date AS Date) <= '2017-04-24'
AND c.clientId IN (a.Owner1, a.Owner2, a.Owner3, a.Owner4, a.Owner5))
您的查询爆破了数据集,对账户进行了两次扫描。此外,我们不必 cast 列侧以适应日期范围,现在有一种有益的方法(它可能并不总是导致扫描 but it's still not great)。尝试:
SELECT c.*, a.accountNo
FROM dbo.accounts AS a
CROSS APPLY
(
VALUES(Owner1),(Owner2),(Owner3),(Owner4),(Owner5)
) AS ac(clientId)
INNER JOIN dbo.client AS c
ON c.clientId = ac.clientId
WHERE a.Account_Date >= '20101115'
AND a.Account_Date < '20170425';
您的主要问题是缺乏规范化。您不应该有五个 Owner
列。相反,您应该有一个单独的 table of AccountOwner
。那你就直接加入吧。
This is effectively what you get from @AaronBetrand's answer, except that that one cannot be indexed as it is virtual.
另请注意:
- 不需要在子查询中再次访问
accounts
。 - 始终使用显式连接语法,而不是隐式
,
语法。 - 永远不要为了过滤或加入而投射列。始终转换常量。在这种情况下,您甚至不需要投射它们。
SELECT
c.*,
a.accountNo
FROM Client AS c
JOIN AccountOwner AS ao ON ao.OwnerId = c.ClientId
JOIN Accounts AS a
ON a.AccountNo = ao.AccountNo
AND a.Account_Date >= '20101115'
AND a.Account_Date < '20170425';
要使此查询高效工作,您将需要以下索引
Account (Account_Date, AccountNo)
AccountOwner (AccountNo, OwnerId)
Client (ClientId) INCLUDE (OtherColumns)
另一组索引可能证明是更好的访问策略(您需要测试)
Account (AccountNo) INCLUDE (Account_Date)
AccountOwner (OwnerId, AccountNo)
Client (ClientId) INCLUDE (OtherColumns)