如何用层次关系表达多列的范围?
How to express a range over multiple columns with hierarchic relation?
我正在将旧的会计软件移植到 SQL。这是一个会计科目表示例:
账户
子账户
SubSubAcct
SubSubSubAcct
帐号
姓名
1110
0
0
0
1110
银行
1110
1
0
0
1110-1
美国银行
1110
1
1
0
1110-1-1
银行一
1110
1
1
1
1110-1-1-1
第一银行#123456
1110
1
1
2
1110-1-1-2
第一银行#234567
1110
1
1
11
1110-1-1-11
第一银行#11223344
1110
1
2
0
1110-1-2-0
银行二
1110
1
2
1
1110-1-2-1
银行二 #876543
1110
2
0
0
1110-2
外资银行
1110
2
1
0
1110-2-1
日本一号#556677
1120
0
0
0
1120
应收账款
1120
1
0
0
1120-1
美国应收账款
1120
1
1
0
1120-1-1
第一区
1120
1
1
1
1120-1-1-1
客户 AAA
1120
1
1
2
1120-1-1-2
客户 BBB
1120
1
1
3
1120-1-1-3
客户 CCC
1120
1
2
0
1120-1-2-0
二区
1120
1
2
1
1120-1-2-1
客户 WWW
1120
1
2
2
1120-1-2-2
客户 YYY
我需要查询任意范围的账户,例如,从账号1110-1-1-2到账号1120-1-2。
这个有效:
SELECT * FROM Accounts
WHERE FORMAT(Account,'D8')+'-'+
FORMAT(SubAcct,'D8')+'-'+
FORMAT(SubSubAcct,'D8')+'-'+
FORMAT(SubSubSubAcct,'D8')
BETWEEN '00001110-00000001-00000001-00000002'
AND '00001120-00000001-00000002-00000000'
ORDER BY Account,SubAcct,SubSubAcct,SubSubSubAcct
但我认为这不是一个好方法。这是一个包含示例架构和数据的 SQLFiddle。
对于如何表达查询或更好的 table 定义的任何想法,我将不胜感激。
一种替代方法是枚举行,然后使用该枚举:
with enumerated as (
select a.*,
row_number() over (order by Account, SubAcct, SubSubAcct, SubSubSubAcct) as seqnum
from accounts a
)
select e.*
from (select e.*,
max(case when account = 1110 and subacct = 1 and subsubacct = 1 and subsubsubacct = 2 then seqnum end) as seqnum_1,
max(case when account = 1120 and subacct = 1 and subsubacct = 2 then seqnum end) as seqnum_2
from enumerated e
) e
where seqnum between seqnum_1 and seqnum_2;
如果您有一个与 seqnum
顺序相同的“行号”列,则不需要 CTE。
编辑:
您可以通过输入您要查找的帐户轻松地进行调整。以下版本还添加了一个标志,指示 enumerated
中的行是下边界、原始数据还是上边界。
with enumerated as (
select e.*,
row_number() over (order by Account, SubAcct, SubSubAcct, SubSubSubAcct) as seqnum
from ((select account, subacct, subsubacct, subsubsubacct, 0 as ord
from accounts a
) union all
select 1110, 1, 1, 2, -1
union all
select 1120, 1, 2, -1, 1
) e
)
select e.*
from (select e.*,
max(case when ord = -1 then seqnum end) as seqnum_1,
max(case when ord = 1 then seqnum end) as seqnum_2
from enumerated e
) e
where seqnum between seqnum_1 and seqnum_2 and
ord = 0;
这对缺失值使用 -1
,我认为这是意图(组件的任何值都不在所有其他值“之前”)。
为了完整起见,这里提供一种简单的方法。性能应该比你现在的好。
SELECT *
FROM Accounts
WHERE
(
account > 1110 OR
account = 1110 AND subacct > 1 OR
account = 1110 AND subacct = 1 AND subsubacct > 1 OR
account = 1110 AND subacct = 1 AND subsubacct = 1 AND subsubsubacct >= 2
) AND (
account < 1120 OR
account = 1120 AND subacct < 1 OR
account = 1120 AND subacct = 1 AND subsubacct < 2 OR
account = 1120 AND subacct = 1 AND subsubacct = 2 AND subsubsubacct <= 0
)
如果优化器未能找到合适的范围扫描,您可以在条件中添加account BETWEEN 1110 AND 1120
。
最好的解决方案是编写一个带有 12 个参数和 returns TRUE 或 FALSE 的用户定义函数。这将使您的应用程序代码更具可读性,不那么脆弱,将集中应用程序代码,简化查询,甚至将模式与您的应用程序代码隔离开来(特别是元组函数,IMO 在该领域未得到充分利用)。
您几乎可以用任何语言编写 UDF,包括 SQL,但这是在 postgresql 中的实现方式。根据您的 dbms,您可以命名参数。
CREATE FUNCTION Between_Accounts(int, int, int, int,
int, int, int, int,
int, int, int, int) RETURNS bool LANGUAGE <whateverLang> $$
. write your comparison function ... return true/false
$$
例如,在上面的代码中,您可以拥有与已有的基本相同的逻辑,或者来自您收到的解决方案中的任何逻辑。或者用 pl/SQL(或类似的语言)实现它并使其更易于阅读。
然后可以在你的where子句中调用函数:
SELECT * FROM Accounts
WHERE BetweenAccounts(Account,SubAcct, SubSubAcct, SubSubSubAcct,
Acc1, SubAcc1, SubSubAcc1, SubSubSubAcc1,
Acc2, SubAcc2, SubSubAcc2, SubSubSubAcc2)
ORDER BY Account,SubAcct,SubSubAcct,SubSubSubAcct
你也可以写一个函数,returns一组元组。应用程序代码甚至不需要知道 table 的名称。例如下面的函数只接受账户之间:
CREATE FUNCTION Tuples_Between_Accounts(
int, int, int, int,
int, int, int, int)
RETURNS Accounts --schema of the tuples returned
LANGUAGE sql
$$
-- write all your logic here and return the tuples ordered by...
-- you can reuse any of the SQL solutions given here...
-- of course the strings below are hardcoded, they will
-- need to be written in terms of parameters to
SELECT * FROM Accounts
WHERE FORMAT(Account,'D8')+'-'+
FORMAT(SubAcct,'D8')+'-'+
FORMAT(SubSubAcct,'D8')+'-'+
FORMAT(SubSubSubAcct,'D8')
BETWEEN '00001110-00000001-00000001-00000002'
AND '00001120-00000001-00000002-00000000'
ORDER BY Account,SubAcct,SubSubAcct,SubSubSubAcct
$$
那么您所要做的就是:
SELECT * FROM
Tuples_BetweenAccounts(
Acc1, SubAcc1, SubSubAcc1, SubSubSubAcc1,
Acc2, SubAcc2, SubSubAcc2, SubSubSubAcc2);
使用 UDF 将使您的应用程序代码不那么脆弱且更易于维护,因为您只有如何在 DBMS 中的帐户之间查找元组的逻辑。
看了AccountNumber
的结构后,我突然想到还有一个有趣的选择。
我们可以添加一个名为 HierID
的 **persisted**
列,它将您的 AccountNumber
转换为 HierarchyID
数据类型。然后我们可以利用 HierID.IsDescendantOf
甚至应用您的范围
您可以按原样更改 table 或查看 dbFiddle
Alter Table Accounts add [HierID] as convert(hierarchyid,'/'+replace(AccountNumber,'-','/')+'/') PERSISTED;
注意:创建索引是可选的,但强烈建议。
现在,假设我想要 1110-1-1 Bank One
和 1120 Receivables (including descendants)
之间的所有内容,查询将如下所示:
Declare @R1 varchar(50) = '1110-1-1'
Declare @R2 varchar(50) = '1120'
Select *
from Accounts
Where HierID between convert(hierarchyid,'/'+replace(@R1,'-','/')+'/')
and convert(hierarchyid,'/'+replace(@R2+'-99999','-','/')+'/')
结果
现在,假设我想要 1110-1 US Banks
的后代,查询将如下所示:
Declare @S varchar(50) = '1110-1'
Select *
From Accounts
Where HierID.IsDescendantOf( convert(hierarchyid,'/'+replace(@S,'-','/')+'/') ) = 1
结果
我会创建一个计算列和一个索引。
但要注意:由于 FORMAT 是不确定的,因此计算不应 FORMAT(..., 'D8')
-- FORMAT is non-deterministic, hence, not allowing INDEXes
-- Used RIGHT, which is deterministic
ALTER TABLE Accounts
ADD AccountNumberNormalized
AS
RIGHT('00000000' + CONVERT(VARCHAR, Account), 8) + '-' +
RIGHT('00000000' + CONVERT(VARCHAR, SubAcct), 8) + '-' +
RIGHT('00000000' + CONVERT(VARCHAR, SubSubAcct), 8) + '-' +
RIGHT('00000000' + CONVERT(VARCHAR, SubSubSubAcct), 8);
CREATE INDEX AK_Accounts_Normalized
ON Accounts(AccountNumberNormalized);
那么,查询就这么简单了
SELECT * FROM Accounts
WHERE
AccountNumberNormalized
BETWEEN '00001110-00000001-00000001-00000002'
AND '00001120-00000001-00000002-00000000'
ORDER BY Account,SubAcct,SubSubAcct,SubSubSubAcct
结果 fiddle 在这里:http://sqlfiddle.com/#!18/bc2b3/1
您描述的数据与 PeopleSoft 树 (https://docs.oracle.com/cd/E24150_01/pt851h2/eng/psbooks/ttrm/chapter.htm?File=ttrm/htm/ttrm03.htm) 非常相似,但是您的数据并未以真正的分层方式存储,这阻碍了有效访问。例如,您的 SubAcct
值失去意义,因为其中的相同值跨越多个 Account
列值。该值应与父级连接。这是因为它本身就是一个节点。除了 SubSubSubAcct
之外的所有内容都一样,您在同一节点中永远不会有超过一个,因此那里的值无关紧要。换句话说,您必须为每个节点设置唯一值。否则它就坏了,你就陷入了现在的困境。
也就是说,您的访问方式也必须改变。假设您永远不需要针对节点中的某些但不是所有叶子,您可以更改 where 子句条件,使其专注于节点(在转换数据之后)。
我怀疑您是否真的需要查询“任何范围的帐户”而不是某些范围的节点。换句话说,在银行 1 到 5 中,你想要银行 2 到 4 的所有账户,还是只想要银行 2 的部分账户,以及银行 3 和 4 的所有账户?我不确定我是否理解叶子的顺序对于这种部分抓取节点的重要性(您的要求表明可能存在,因为您需要帐户 between
)。一些上下文会有帮助。
无论如何,我会在尝试查询之前转换这些数据。
关键要求:“我需要查询任意范围的帐户”,无论范围端点中的“帐号”是否实际存在。需要的第一段代码是一个可靠地解析范围端点组件的函数。在这种情况下,该函数依赖于一个名为 dbo 的序数拆分器。DelimitedSplit8K_LEAD(解释 here)
DelimitedSplit8K_LEAD
CREATE FUNCTION [dbo].[DelimitedSplit8K_LEAD]
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1
FROM cteTally t
WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,8000))
FROM cteStart s
;
拆分端点“账号”的功能
create function dbo.test_fnAccountParts(
@acct varchar(12))
returns table with schemabinding as return
select max(case when dl.ItemNumber=1 then Item else 0 end) a,
max(case when dl.ItemNumber=2 then Item else 0 end) sa,
max(case when dl.ItemNumber=3 then Item else 0 end) ssa,
max(case when dl.ItemNumber=4 then Item else 0 end) sssa
from dbo.DelimitedSplit8K_LEAD(@acct,'-') dl;
查询以根据序号定位行
declare
@acct1 varchar(12)='1110-1-1-2',
@acct2 varchar(12)='1120-1-2';
with
rn_cte as (
select a.*, row_number() over (order by Account, SubAcct, SubSubAcct, SubSubSubAcct) rn
from #accounts a),
rn1_cte as (select max(rn) max_rn
from rn_cte r
cross apply dbo.test_fnAccountParts(@acct1) ap
where r.Account<=ap.a
and r.SubAcct<=ap.sa
and r.SubSubAcct<=ap.ssa
and r.SubSubSubAcct<=ap.sssa),
rn2_cte as (select max(rn) max_rn
from rn_cte r
cross apply dbo.test_fnAccountParts(@acct2) ap
where r.Account<=ap.a
and r.SubAcct<=ap.sa
and r.SubSubAcct<=ap.ssa
and r.SubSubSubAcct<=ap.sssa)
select rn.*
from rn_cte rn
cross join rn1_cte r1
cross join rn2_cte r2
where rn.rn between r1.max_rn
and r2.max_rn;
Account SubAcct SubSubAcct SubSubSubAcct AccountNumber Name rn
1110 1 1 2 1110-1-1-2 Bank One #234567 5
1110 1 1 11 1110-1-1-11 Bank One #11223344 6
1110 1 2 0 1110-1-2-0 Bank Two 7
1110 1 2 1 1110-1-2-1-1 Bank Two #876543 8
1110 2 0 0 1110-2 Foreign Banks 9
1110 2 1 0 1110-2-1 Japan One #556677 10
1120 0 0 0 1120 Receivables 11
1120 1 0 0 1120-1 US Receivables 12
1120 1 1 0 1120-1-1 Zone One 13
1120 1 1 1 1120-1-1-1 Customer AAA 14
1120 1 1 2 1120-1-1-2 Customer BBB 15
1120 1 1 3 1120-1-1-3 Customer CCC 16
1120 1 2 0 1120-1-2-0 Zone Two 17
假设您要添加一个名为 AccountNumberNormalized 的索引计算列(如 Marcus Vinicius Pompeu 的回答中所建议的那样)。这是个好建议。然后,您将需要一个函数来 return 端点的规范化帐号。像这样
drop function if exists dbo.test_fnAccountNumberNormalized;
go
create function dbo.test_fnAccountNumberNormalized(
@acct varchar(12))
returns table with schemabinding as return
select concat_ws('-', RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=1 then Item else 0 end)) ), 8),
RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=2 then Item else 0 end)) ), 8),
RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=3 then Item else 0 end)) ), 8),
RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=4 then Item else 0 end)) ), 8))
AccountNumberNormalized
from dbo.DelimitedSplit8K_LEAD(@acct,'-') dl;
那么这个查询 return 的结果(13 行)与上述相同
declare
@acct1 varchar(12)='1110-1-1-2',
@acct2 varchar(12)='1120-1-2';
SELECT a.*
FROM #Accounts a
cross apply dbo.test_fnAccountNumberNormalized(@acct1) fn1
cross apply dbo.test_fnAccountNumberNormalized(@acct2) fn2
WHERE
a.AccountNumberNormalized
BETWEEN fn1.AccountNumberNormalized
AND fn2.AccountNumberNormalized
ORDER BY Account,SubAcct,SubSubAcct,SubSubSubAcct;
这些是内联 table 值函数。如果您使用的是 SQL 2019(或兼容级别 150),您也许可以将它们更改为内联标量函数。
[编辑] 这是一个 returns CHAR(35) 的标量函数。它肯定会清理代码。性能方面,它取决于具体情况,需要进行测试。此查询 return 与上面的结果相同(13 行)。
create function dbo.test_scalar_fnAccountNumberNormalized(
@acct varchar(12))
returns char(35) as
begin
return (
select concat_ws('-', RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=1 then Item else 0 end)) ), 8),
RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=2 then Item else 0 end)) ), 8),
RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=3 then Item else 0 end)) ), 8),
RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=4 then Item else 0 end)) ), 8))
AccountNumberNormalized
from dbo.DelimitedSplit8K_LEAD(@acct,'-') dl);
end
declare
@acct1 varchar(12)='1110-1-1-2',
@acct2 varchar(12)='1120-1-2';
SELECT a.*
FROM #Accounts a
WHERE
a.AccountNumberNormalized
BETWEEN dbo.test_scalar_fnAccountNumberNormalized(@acct1)
AND dbo.test_scalar_fnAccountNumberNormalized(@acct2)
ORDER BY Account,SubAcct,SubSubAcct,SubSubSubAcct;
为什么不把它转化成一个大数来比较呢?比任何字符串计算都快。
SELECT
(Account * 1000 + SubAcct *100 + SubSubAcct*10 + SubSubSubAcct) as full_Account
FROM Accounts
WHERE (Account * 1000 + SubAcct *100 + SubSubAcct*10 + SubSubSubAcct)
between 1110112 and 1120120
ORDER BY Account,SubAcct,SubSubAcct,SubSubSubAcct
Ints 比 char 更快,并且随着您的深入,有大量可供 ints 扩展的好函数。
只需确保没有冲突.. 帐户的最大长度、子行为等将是您放置在本机密钥后面的#0。
仅 SQL - 无需花哨。快速、可扩展、易于记录。
SELECT
*
FROM
Accounts
WHERE
SubSubSubAcct +
SubSubAcct* 100000000 +
SubAcct * 10000000000000000 +
Account * 1000000000000000000000000
BETWEEN
1110 *1000000000000000000000000 + --Account
1 *10000000000000000 + --SubAcct
1 *100000000 + --SubSubAcct
2 --SubSubSubAcct
and
1120 *1000000000000000000000000 + --Account
1 *10000000000000000 + --SubAcct
2 *100000000 + --SubSubAcct
0 --SubSubSubAcct
冗余数据是旧会计 table 定义的主要问题。例如,它有 SubAcct、SubSubAcct、SubSubSubAcct,也许还有 Sub...Acct 列。我相信这 table 不遵守规范化规则。
如果你想创建一个更好的table定义,那么我可以假设你定义了 3 列而不是 6 列,因为你可以管理更多的子账户而不是 3 个子账户。
CREATE TABLE [dbo].[Accounts](
[AccountID] [int] NOT NULL,
[ParentAccountID] [int] NULL,
[Name] [VARCHAR](100) NOT NULL,
CONSTRAINT [PK_Accounts] PRIMARY KEY CLUSTERED
(
[AccountID] ASC
),
CONSTRAINT FK_ParentAccount FOREIGN KEY (ParentAccountID)
REFERENCES Accounts(AccountID)
);
为了更好地维护递归关系,我更改了您的结构和值。
INSERT INTO Accounts
([AccountID], [ParentAccountID], [Name])
VALUES
(1110,null, 'Banks'),
(11101,1110, 'US Banks'),
(111011,11101, 'Bank One'),
(1110111,111011, 'Bank One #123456'),
(1110112,111011, 'Bank One #234567'),
(11101111,1110111 , 'Bank One #11223344'),
(1110120, 1110112, 'Bank Two'),
(1110121, 1110112, 'Bank Two #876543'),
(11101211, 1110121, 'Bank Two #876543')
;
通过这个查询,你可以找到'Level'、'Path'、'Root'
此外,您可以通过 'between' 语法
过滤它
WITH CTE_TreeAccounts
AS ( SELECT ParentAccountID ,
Name ,
Name AS FullPathName ,
CAST(AccountID AS VARCHAR(100)) AS FullPathID ,
0 AS lvl ,
AccountID,
AccountID AS rootid
FROM Accounts
WHERE ParentAccountID IS NULL
UNION ALL
SELECT ac.ParentAccountID ,
ac.Name AS name ,
CAST(CONCAT(ISNULL(actree.FullPathName, ''), ' / ',
ac.Name) AS VARCHAR(100)) AS name ,
CAST(CONCAT(ISNULL(actree.FullPathID, ''), '-',
ac.AccountID) AS VARCHAR(100)) AS name ,
actree.lvl + 1 ,
ac.AccountID,
actree.rootid
FROM Accounts AS ac
INNER JOIN CTE_TreeAccounts actree ON actree.AccountID = ac.ParentAccountID
)
Select * from CTE_TreeAccounts
这是一个 SQLFiddle 示例架构和数据
我会创建一个名为“AccountNumberRange”的新列,并完全按照您使用 FORMAT 所做的那样填写它。
update Accounts set AccountNumberRange = FORMAT(Account,'D8')+'-'+
FORMAT(SubAcct,'D8')+'-'+
FORMAT(SubSubAcct,'D8')+'-'+
FORMAT(SubSubSubAcct,'D8');
之后我会为此专栏指定 default value,以保持更新。
这样做,您将为该列编制索引并加快查询速度。
最好的,
胡里奥
我正在将旧的会计软件移植到 SQL。这是一个会计科目表示例:
账户 | 子账户 | SubSubAcct | SubSubSubAcct | 帐号 | 姓名 |
---|---|---|---|---|---|
1110 | 0 | 0 | 0 | 1110 | 银行 |
1110 | 1 | 0 | 0 | 1110-1 | 美国银行 |
1110 | 1 | 1 | 0 | 1110-1-1 | 银行一 |
1110 | 1 | 1 | 1 | 1110-1-1-1 | 第一银行#123456 |
1110 | 1 | 1 | 2 | 1110-1-1-2 | 第一银行#234567 |
1110 | 1 | 1 | 11 | 1110-1-1-11 | 第一银行#11223344 |
1110 | 1 | 2 | 0 | 1110-1-2-0 | 银行二 |
1110 | 1 | 2 | 1 | 1110-1-2-1 | 银行二 #876543 |
1110 | 2 | 0 | 0 | 1110-2 | 外资银行 |
1110 | 2 | 1 | 0 | 1110-2-1 | 日本一号#556677 |
1120 | 0 | 0 | 0 | 1120 | 应收账款 |
1120 | 1 | 0 | 0 | 1120-1 | 美国应收账款 |
1120 | 1 | 1 | 0 | 1120-1-1 | 第一区 |
1120 | 1 | 1 | 1 | 1120-1-1-1 | 客户 AAA |
1120 | 1 | 1 | 2 | 1120-1-1-2 | 客户 BBB |
1120 | 1 | 1 | 3 | 1120-1-1-3 | 客户 CCC |
1120 | 1 | 2 | 0 | 1120-1-2-0 | 二区 |
1120 | 1 | 2 | 1 | 1120-1-2-1 | 客户 WWW |
1120 | 1 | 2 | 2 | 1120-1-2-2 | 客户 YYY |
我需要查询任意范围的账户,例如,从账号1110-1-1-2到账号1120-1-2。
这个有效:
SELECT * FROM Accounts
WHERE FORMAT(Account,'D8')+'-'+
FORMAT(SubAcct,'D8')+'-'+
FORMAT(SubSubAcct,'D8')+'-'+
FORMAT(SubSubSubAcct,'D8')
BETWEEN '00001110-00000001-00000001-00000002'
AND '00001120-00000001-00000002-00000000'
ORDER BY Account,SubAcct,SubSubAcct,SubSubSubAcct
但我认为这不是一个好方法。这是一个包含示例架构和数据的 SQLFiddle。
对于如何表达查询或更好的 table 定义的任何想法,我将不胜感激。
一种替代方法是枚举行,然后使用该枚举:
with enumerated as (
select a.*,
row_number() over (order by Account, SubAcct, SubSubAcct, SubSubSubAcct) as seqnum
from accounts a
)
select e.*
from (select e.*,
max(case when account = 1110 and subacct = 1 and subsubacct = 1 and subsubsubacct = 2 then seqnum end) as seqnum_1,
max(case when account = 1120 and subacct = 1 and subsubacct = 2 then seqnum end) as seqnum_2
from enumerated e
) e
where seqnum between seqnum_1 and seqnum_2;
如果您有一个与 seqnum
顺序相同的“行号”列,则不需要 CTE。
编辑:
您可以通过输入您要查找的帐户轻松地进行调整。以下版本还添加了一个标志,指示 enumerated
中的行是下边界、原始数据还是上边界。
with enumerated as (
select e.*,
row_number() over (order by Account, SubAcct, SubSubAcct, SubSubSubAcct) as seqnum
from ((select account, subacct, subsubacct, subsubsubacct, 0 as ord
from accounts a
) union all
select 1110, 1, 1, 2, -1
union all
select 1120, 1, 2, -1, 1
) e
)
select e.*
from (select e.*,
max(case when ord = -1 then seqnum end) as seqnum_1,
max(case when ord = 1 then seqnum end) as seqnum_2
from enumerated e
) e
where seqnum between seqnum_1 and seqnum_2 and
ord = 0;
这对缺失值使用 -1
,我认为这是意图(组件的任何值都不在所有其他值“之前”)。
为了完整起见,这里提供一种简单的方法。性能应该比你现在的好。
SELECT *
FROM Accounts
WHERE
(
account > 1110 OR
account = 1110 AND subacct > 1 OR
account = 1110 AND subacct = 1 AND subsubacct > 1 OR
account = 1110 AND subacct = 1 AND subsubacct = 1 AND subsubsubacct >= 2
) AND (
account < 1120 OR
account = 1120 AND subacct < 1 OR
account = 1120 AND subacct = 1 AND subsubacct < 2 OR
account = 1120 AND subacct = 1 AND subsubacct = 2 AND subsubsubacct <= 0
)
如果优化器未能找到合适的范围扫描,您可以在条件中添加account BETWEEN 1110 AND 1120
。
最好的解决方案是编写一个带有 12 个参数和 returns TRUE 或 FALSE 的用户定义函数。这将使您的应用程序代码更具可读性,不那么脆弱,将集中应用程序代码,简化查询,甚至将模式与您的应用程序代码隔离开来(特别是元组函数,IMO 在该领域未得到充分利用)。
您几乎可以用任何语言编写 UDF,包括 SQL,但这是在 postgresql 中的实现方式。根据您的 dbms,您可以命名参数。
CREATE FUNCTION Between_Accounts(int, int, int, int,
int, int, int, int,
int, int, int, int) RETURNS bool LANGUAGE <whateverLang> $$
. write your comparison function ... return true/false
$$
例如,在上面的代码中,您可以拥有与已有的基本相同的逻辑,或者来自您收到的解决方案中的任何逻辑。或者用 pl/SQL(或类似的语言)实现它并使其更易于阅读。
然后可以在你的where子句中调用函数:
SELECT * FROM Accounts
WHERE BetweenAccounts(Account,SubAcct, SubSubAcct, SubSubSubAcct,
Acc1, SubAcc1, SubSubAcc1, SubSubSubAcc1,
Acc2, SubAcc2, SubSubAcc2, SubSubSubAcc2)
ORDER BY Account,SubAcct,SubSubAcct,SubSubSubAcct
你也可以写一个函数,returns一组元组。应用程序代码甚至不需要知道 table 的名称。例如下面的函数只接受账户之间:
CREATE FUNCTION Tuples_Between_Accounts(
int, int, int, int,
int, int, int, int)
RETURNS Accounts --schema of the tuples returned
LANGUAGE sql
$$
-- write all your logic here and return the tuples ordered by...
-- you can reuse any of the SQL solutions given here...
-- of course the strings below are hardcoded, they will
-- need to be written in terms of parameters to
SELECT * FROM Accounts
WHERE FORMAT(Account,'D8')+'-'+
FORMAT(SubAcct,'D8')+'-'+
FORMAT(SubSubAcct,'D8')+'-'+
FORMAT(SubSubSubAcct,'D8')
BETWEEN '00001110-00000001-00000001-00000002'
AND '00001120-00000001-00000002-00000000'
ORDER BY Account,SubAcct,SubSubAcct,SubSubSubAcct
$$
那么您所要做的就是:
SELECT * FROM
Tuples_BetweenAccounts(
Acc1, SubAcc1, SubSubAcc1, SubSubSubAcc1,
Acc2, SubAcc2, SubSubAcc2, SubSubSubAcc2);
使用 UDF 将使您的应用程序代码不那么脆弱且更易于维护,因为您只有如何在 DBMS 中的帐户之间查找元组的逻辑。
看了AccountNumber
的结构后,我突然想到还有一个有趣的选择。
我们可以添加一个名为 HierID
的 **persisted**
列,它将您的 AccountNumber
转换为 HierarchyID
数据类型。然后我们可以利用 HierID.IsDescendantOf
甚至应用您的范围
您可以按原样更改 table 或查看 dbFiddle
Alter Table Accounts add [HierID] as convert(hierarchyid,'/'+replace(AccountNumber,'-','/')+'/') PERSISTED;
注意:创建索引是可选的,但强烈建议。
现在,假设我想要 1110-1-1 Bank One
和 1120 Receivables (including descendants)
之间的所有内容,查询将如下所示:
Declare @R1 varchar(50) = '1110-1-1'
Declare @R2 varchar(50) = '1120'
Select *
from Accounts
Where HierID between convert(hierarchyid,'/'+replace(@R1,'-','/')+'/')
and convert(hierarchyid,'/'+replace(@R2+'-99999','-','/')+'/')
结果
现在,假设我想要 1110-1 US Banks
的后代,查询将如下所示:
Declare @S varchar(50) = '1110-1'
Select *
From Accounts
Where HierID.IsDescendantOf( convert(hierarchyid,'/'+replace(@S,'-','/')+'/') ) = 1
结果
我会创建一个计算列和一个索引。
但要注意:由于 FORMAT 是不确定的,因此计算不应 FORMAT(..., 'D8')
-- FORMAT is non-deterministic, hence, not allowing INDEXes
-- Used RIGHT, which is deterministic
ALTER TABLE Accounts
ADD AccountNumberNormalized
AS
RIGHT('00000000' + CONVERT(VARCHAR, Account), 8) + '-' +
RIGHT('00000000' + CONVERT(VARCHAR, SubAcct), 8) + '-' +
RIGHT('00000000' + CONVERT(VARCHAR, SubSubAcct), 8) + '-' +
RIGHT('00000000' + CONVERT(VARCHAR, SubSubSubAcct), 8);
CREATE INDEX AK_Accounts_Normalized
ON Accounts(AccountNumberNormalized);
那么,查询就这么简单了
SELECT * FROM Accounts
WHERE
AccountNumberNormalized
BETWEEN '00001110-00000001-00000001-00000002'
AND '00001120-00000001-00000002-00000000'
ORDER BY Account,SubAcct,SubSubAcct,SubSubSubAcct
结果 fiddle 在这里:http://sqlfiddle.com/#!18/bc2b3/1
您描述的数据与 PeopleSoft 树 (https://docs.oracle.com/cd/E24150_01/pt851h2/eng/psbooks/ttrm/chapter.htm?File=ttrm/htm/ttrm03.htm) 非常相似,但是您的数据并未以真正的分层方式存储,这阻碍了有效访问。例如,您的 SubAcct
值失去意义,因为其中的相同值跨越多个 Account
列值。该值应与父级连接。这是因为它本身就是一个节点。除了 SubSubSubAcct
之外的所有内容都一样,您在同一节点中永远不会有超过一个,因此那里的值无关紧要。换句话说,您必须为每个节点设置唯一值。否则它就坏了,你就陷入了现在的困境。
也就是说,您的访问方式也必须改变。假设您永远不需要针对节点中的某些但不是所有叶子,您可以更改 where 子句条件,使其专注于节点(在转换数据之后)。
我怀疑您是否真的需要查询“任何范围的帐户”而不是某些范围的节点。换句话说,在银行 1 到 5 中,你想要银行 2 到 4 的所有账户,还是只想要银行 2 的部分账户,以及银行 3 和 4 的所有账户?我不确定我是否理解叶子的顺序对于这种部分抓取节点的重要性(您的要求表明可能存在,因为您需要帐户 between
)。一些上下文会有帮助。
无论如何,我会在尝试查询之前转换这些数据。
关键要求:“我需要查询任意范围的帐户”,无论范围端点中的“帐号”是否实际存在。需要的第一段代码是一个可靠地解析范围端点组件的函数。在这种情况下,该函数依赖于一个名为 dbo 的序数拆分器。DelimitedSplit8K_LEAD(解释 here)
DelimitedSplit8K_LEAD
CREATE FUNCTION [dbo].[DelimitedSplit8K_LEAD]
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1
FROM cteTally t
WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,8000))
FROM cteStart s
;
拆分端点“账号”的功能
create function dbo.test_fnAccountParts(
@acct varchar(12))
returns table with schemabinding as return
select max(case when dl.ItemNumber=1 then Item else 0 end) a,
max(case when dl.ItemNumber=2 then Item else 0 end) sa,
max(case when dl.ItemNumber=3 then Item else 0 end) ssa,
max(case when dl.ItemNumber=4 then Item else 0 end) sssa
from dbo.DelimitedSplit8K_LEAD(@acct,'-') dl;
查询以根据序号定位行
declare
@acct1 varchar(12)='1110-1-1-2',
@acct2 varchar(12)='1120-1-2';
with
rn_cte as (
select a.*, row_number() over (order by Account, SubAcct, SubSubAcct, SubSubSubAcct) rn
from #accounts a),
rn1_cte as (select max(rn) max_rn
from rn_cte r
cross apply dbo.test_fnAccountParts(@acct1) ap
where r.Account<=ap.a
and r.SubAcct<=ap.sa
and r.SubSubAcct<=ap.ssa
and r.SubSubSubAcct<=ap.sssa),
rn2_cte as (select max(rn) max_rn
from rn_cte r
cross apply dbo.test_fnAccountParts(@acct2) ap
where r.Account<=ap.a
and r.SubAcct<=ap.sa
and r.SubSubAcct<=ap.ssa
and r.SubSubSubAcct<=ap.sssa)
select rn.*
from rn_cte rn
cross join rn1_cte r1
cross join rn2_cte r2
where rn.rn between r1.max_rn
and r2.max_rn;
Account SubAcct SubSubAcct SubSubSubAcct AccountNumber Name rn
1110 1 1 2 1110-1-1-2 Bank One #234567 5
1110 1 1 11 1110-1-1-11 Bank One #11223344 6
1110 1 2 0 1110-1-2-0 Bank Two 7
1110 1 2 1 1110-1-2-1-1 Bank Two #876543 8
1110 2 0 0 1110-2 Foreign Banks 9
1110 2 1 0 1110-2-1 Japan One #556677 10
1120 0 0 0 1120 Receivables 11
1120 1 0 0 1120-1 US Receivables 12
1120 1 1 0 1120-1-1 Zone One 13
1120 1 1 1 1120-1-1-1 Customer AAA 14
1120 1 1 2 1120-1-1-2 Customer BBB 15
1120 1 1 3 1120-1-1-3 Customer CCC 16
1120 1 2 0 1120-1-2-0 Zone Two 17
假设您要添加一个名为 AccountNumberNormalized 的索引计算列(如 Marcus Vinicius Pompeu 的回答中所建议的那样)。这是个好建议。然后,您将需要一个函数来 return 端点的规范化帐号。像这样
drop function if exists dbo.test_fnAccountNumberNormalized;
go
create function dbo.test_fnAccountNumberNormalized(
@acct varchar(12))
returns table with schemabinding as return
select concat_ws('-', RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=1 then Item else 0 end)) ), 8),
RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=2 then Item else 0 end)) ), 8),
RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=3 then Item else 0 end)) ), 8),
RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=4 then Item else 0 end)) ), 8))
AccountNumberNormalized
from dbo.DelimitedSplit8K_LEAD(@acct,'-') dl;
那么这个查询 return 的结果(13 行)与上述相同
declare
@acct1 varchar(12)='1110-1-1-2',
@acct2 varchar(12)='1120-1-2';
SELECT a.*
FROM #Accounts a
cross apply dbo.test_fnAccountNumberNormalized(@acct1) fn1
cross apply dbo.test_fnAccountNumberNormalized(@acct2) fn2
WHERE
a.AccountNumberNormalized
BETWEEN fn1.AccountNumberNormalized
AND fn2.AccountNumberNormalized
ORDER BY Account,SubAcct,SubSubAcct,SubSubSubAcct;
这些是内联 table 值函数。如果您使用的是 SQL 2019(或兼容级别 150),您也许可以将它们更改为内联标量函数。
[编辑] 这是一个 returns CHAR(35) 的标量函数。它肯定会清理代码。性能方面,它取决于具体情况,需要进行测试。此查询 return 与上面的结果相同(13 行)。
create function dbo.test_scalar_fnAccountNumberNormalized(
@acct varchar(12))
returns char(35) as
begin
return (
select concat_ws('-', RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=1 then Item else 0 end)) ), 8),
RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=2 then Item else 0 end)) ), 8),
RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=3 then Item else 0 end)) ), 8),
RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=4 then Item else 0 end)) ), 8))
AccountNumberNormalized
from dbo.DelimitedSplit8K_LEAD(@acct,'-') dl);
end
declare
@acct1 varchar(12)='1110-1-1-2',
@acct2 varchar(12)='1120-1-2';
SELECT a.*
FROM #Accounts a
WHERE
a.AccountNumberNormalized
BETWEEN dbo.test_scalar_fnAccountNumberNormalized(@acct1)
AND dbo.test_scalar_fnAccountNumberNormalized(@acct2)
ORDER BY Account,SubAcct,SubSubAcct,SubSubSubAcct;
为什么不把它转化成一个大数来比较呢?比任何字符串计算都快。
SELECT
(Account * 1000 + SubAcct *100 + SubSubAcct*10 + SubSubSubAcct) as full_Account
FROM Accounts
WHERE (Account * 1000 + SubAcct *100 + SubSubAcct*10 + SubSubSubAcct)
between 1110112 and 1120120
ORDER BY Account,SubAcct,SubSubAcct,SubSubSubAcct
Ints 比 char 更快,并且随着您的深入,有大量可供 ints 扩展的好函数。
只需确保没有冲突.. 帐户的最大长度、子行为等将是您放置在本机密钥后面的#0。
仅SQL - 无需花哨。快速、可扩展、易于记录。
SELECT
*
FROM
Accounts
WHERE
SubSubSubAcct +
SubSubAcct* 100000000 +
SubAcct * 10000000000000000 +
Account * 1000000000000000000000000
BETWEEN
1110 *1000000000000000000000000 + --Account
1 *10000000000000000 + --SubAcct
1 *100000000 + --SubSubAcct
2 --SubSubSubAcct
and
1120 *1000000000000000000000000 + --Account
1 *10000000000000000 + --SubAcct
2 *100000000 + --SubSubAcct
0 --SubSubSubAcct
冗余数据是旧会计 table 定义的主要问题。例如,它有 SubAcct、SubSubAcct、SubSubSubAcct,也许还有 Sub...Acct 列。我相信这 table 不遵守规范化规则。
如果你想创建一个更好的table定义,那么我可以假设你定义了 3 列而不是 6 列,因为你可以管理更多的子账户而不是 3 个子账户。
CREATE TABLE [dbo].[Accounts](
[AccountID] [int] NOT NULL,
[ParentAccountID] [int] NULL,
[Name] [VARCHAR](100) NOT NULL,
CONSTRAINT [PK_Accounts] PRIMARY KEY CLUSTERED
(
[AccountID] ASC
),
CONSTRAINT FK_ParentAccount FOREIGN KEY (ParentAccountID)
REFERENCES Accounts(AccountID)
);
为了更好地维护递归关系,我更改了您的结构和值。
INSERT INTO Accounts
([AccountID], [ParentAccountID], [Name])
VALUES
(1110,null, 'Banks'),
(11101,1110, 'US Banks'),
(111011,11101, 'Bank One'),
(1110111,111011, 'Bank One #123456'),
(1110112,111011, 'Bank One #234567'),
(11101111,1110111 , 'Bank One #11223344'),
(1110120, 1110112, 'Bank Two'),
(1110121, 1110112, 'Bank Two #876543'),
(11101211, 1110121, 'Bank Two #876543')
;
通过这个查询,你可以找到'Level'、'Path'、'Root'
此外,您可以通过 'between' 语法
过滤它WITH CTE_TreeAccounts
AS ( SELECT ParentAccountID ,
Name ,
Name AS FullPathName ,
CAST(AccountID AS VARCHAR(100)) AS FullPathID ,
0 AS lvl ,
AccountID,
AccountID AS rootid
FROM Accounts
WHERE ParentAccountID IS NULL
UNION ALL
SELECT ac.ParentAccountID ,
ac.Name AS name ,
CAST(CONCAT(ISNULL(actree.FullPathName, ''), ' / ',
ac.Name) AS VARCHAR(100)) AS name ,
CAST(CONCAT(ISNULL(actree.FullPathID, ''), '-',
ac.AccountID) AS VARCHAR(100)) AS name ,
actree.lvl + 1 ,
ac.AccountID,
actree.rootid
FROM Accounts AS ac
INNER JOIN CTE_TreeAccounts actree ON actree.AccountID = ac.ParentAccountID
)
Select * from CTE_TreeAccounts
这是一个 SQLFiddle 示例架构和数据
我会创建一个名为“AccountNumberRange”的新列,并完全按照您使用 FORMAT 所做的那样填写它。
update Accounts set AccountNumberRange = FORMAT(Account,'D8')+'-'+
FORMAT(SubAcct,'D8')+'-'+
FORMAT(SubSubAcct,'D8')+'-'+
FORMAT(SubSubSubAcct,'D8');
之后我会为此专栏指定 default value,以保持更新。
这样做,您将为该列编制索引并加快查询速度。
最好的, 胡里奥