如何用层次关系表达多列的范围?

How to express a range over multiple columns with hierarchic relation?

我正在将旧的会计软件移植到 SQL。这是一个会计科目表示例:

账户 子账户 SubSubAcct SubSubSubAcct 帐号 姓名
1110 0 0 0 1110 银行
1110 1 0 0 1110-1 美国银行
1110 1 1 0 1110-1-1 银行一
1110 1 1 1 1110-1-1-1 第一银行#123456
1110 1 1 2 1110-1-1-2 第一银行#234567
1110 1 1 11 1110-1-1-11 第一银行#11223344
1110 1 2 0 1110-1-2-0 银行二
1110 1 2 1 1110-1-2-1 银行二 #876543
1110 2 0 0 1110-2 外资银行
1110 2 1 0 1110-2-1 日本一号#556677
1120 0 0 0 1120 应收账款
1120 1 0 0 1120-1 美国应收账款
1120 1 1 0 1120-1-1 第一区
1120 1 1 1 1120-1-1-1 客户 AAA
1120 1 1 2 1120-1-1-2 客户 BBB
1120 1 1 3 1120-1-1-3 客户 CCC
1120 1 2 0 1120-1-2-0 二区
1120 1 2 1 1120-1-2-1 客户 WWW
1120 1 2 2 1120-1-2-2 客户 YYY

我需要查询任意范围的账户,例如,从账号1110-1-1-2到账号1120-1-2。

这个有效:

SELECT * FROM Accounts 
WHERE FORMAT(Account,'D8')+'-'+
      FORMAT(SubAcct,'D8')+'-'+
      FORMAT(SubSubAcct,'D8')+'-'+
      FORMAT(SubSubSubAcct,'D8') 
   BETWEEN '00001110-00000001-00000001-00000002' 
   AND     '00001120-00000001-00000002-00000000'
ORDER BY Account,SubAcct,SubSubAcct,SubSubSubAcct

但我认为这不是一个好方法。这是一个包含示例架构和数据的 SQLFiddle

对于如何表达查询或更好的 table 定义的任何想法,我将不胜感激。

一种替代方法是枚举行,然后使用该枚举:

with enumerated as (
      select a.*,
             row_number() over (order by Account, SubAcct, SubSubAcct, SubSubSubAcct) as seqnum
      from accounts a
     )
select e.*
from (select e.*,
             max(case when account = 1110 and subacct = 1 and subsubacct = 1 and subsubsubacct = 2 then seqnum end) as seqnum_1,
             max(case when account = 1120 and subacct = 1 and subsubacct = 2 then seqnum end) as seqnum_2           
      from enumerated e
     ) e
where seqnum between seqnum_1 and seqnum_2;

如果您有一个与 seqnum 顺序相同的“行号”列,则不需要 CTE。

编辑:

您可以通过输入您要查找的帐户轻松地进行调整。以下版本还添加了一个标志,指示 enumerated 中的行是下边界、原始数据还是上边界。

with enumerated as (
      select e.*,
             row_number() over (order by Account, SubAcct, SubSubAcct, SubSubSubAcct) as seqnum

      from ((select account, subacct, subsubacct, subsubsubacct, 0 as ord
             from accounts a
            ) union all
            select 1110, 1, 1, 2, -1
            union all
            select 1120, 1, 2, -1, 1
           ) e
     )
select e.*
from (select e.*,
             max(case when ord = -1 then seqnum end) as seqnum_1,
             max(case when ord = 1 then seqnum end) as seqnum_2           
      from enumerated e
     ) e
where seqnum between seqnum_1 and seqnum_2 and
      ord = 0;

这对缺失值使用 -1,我认为这是意图(组件的任何值都不在所有其他值“之前”)。

为了完整起见,这里提供一种简单的方法。性能应该比你现在的好。

SELECT * 
FROM Accounts
WHERE 
(
  account > 1110 OR
  account = 1110 AND subacct > 1 OR
  account = 1110 AND subacct = 1 AND subsubacct > 1 OR
  account = 1110 AND subacct = 1 AND subsubacct = 1 AND subsubsubacct >= 2  
) AND (
  account < 1120 OR
  account = 1120 AND subacct < 1 OR
  account = 1120 AND subacct = 1 AND subsubacct < 2 OR
  account = 1120 AND subacct = 1 AND subsubacct = 2 AND subsubsubacct <= 0
)

如果优化器未能找到合适的范围扫描,您可以在条件中添加account BETWEEN 1110 AND 1120

最好的解决方案是编写一个带有 12 个参数和 returns TRUE 或 FALSE 的用户定义函数。这将使您的应用程序代码更具可读性,不那么脆弱,将集中应用程序代码,简化查询,甚至将模式与您的应用程序代码隔离开来(特别是元组函数,IMO 在该领域未得到充分利用)。

您几乎可以用任何语言编写 UDF,包括 SQL,但这是在 postgresql 中的实现方式。根据您的 dbms,您可以命名参数。

CREATE FUNCTION Between_Accounts(int, int, int, int, 
                                 int, int, int, int, 
                                 int, int, int, int) RETURNS bool LANGUAGE <whateverLang> $$
  . write your comparison function ... return true/false
$$

例如,在上面的代码中,您可以拥有与已有的基本相同的逻辑,或者来自您收到的解决方案中的任何逻辑。或者用 pl/SQL(或类似的语言)实现它并使其更易于阅读。

然后可以在你的where子句中调用函数:

SELECT * FROM Accounts 
WHERE BetweenAccounts(Account,SubAcct, SubSubAcct, SubSubSubAcct,
            Acc1, SubAcc1, SubSubAcc1, SubSubSubAcc1,
            Acc2, SubAcc2, SubSubAcc2, SubSubSubAcc2)
ORDER BY Account,SubAcct,SubSubAcct,SubSubSubAcct

你也可以写一个函数,returns一组元组。应用程序代码甚至不需要知道 table 的名称。例如下面的函数只接受账户之间:

CREATE FUNCTION Tuples_Between_Accounts( 
                                 int, int, int, int, 
                                 int, int, int, int) 
RETURNS Accounts --schema of the tuples returned
LANGUAGE sql
  $$
  -- write all your logic here and return the tuples ordered by... 
  -- you can reuse any of the SQL solutions given here...
  -- of course the strings below are hardcoded, they will
  -- need to be written in terms of parameters  to 
    SELECT * FROM Accounts 
    WHERE FORMAT(Account,'D8')+'-'+
          FORMAT(SubAcct,'D8')+'-'+
          FORMAT(SubSubAcct,'D8')+'-'+
          FORMAT(SubSubSubAcct,'D8') 
       BETWEEN '00001110-00000001-00000001-00000002' 
       AND     '00001120-00000001-00000002-00000000'
    ORDER BY Account,SubAcct,SubSubAcct,SubSubSubAcct

$$
    

那么您所要做的就是:

SELECT * FROM
 Tuples_BetweenAccounts(
            Acc1, SubAcc1, SubSubAcc1, SubSubSubAcc1,
            Acc2, SubAcc2, SubSubAcc2, SubSubSubAcc2);

使用 UDF 将使您的应用程序代码不那么脆弱且更易于维护,因为您只有如何在 DBMS 中的帐户之间查找元组的逻辑。

看了AccountNumber的结构后,我突然想到还有一个有趣的选择。

我们可以添加一个名为 HierID**persisted** 列,它将您的 AccountNumber 转换为 HierarchyID 数据类型。然后我们可以利用 HierID.IsDescendantOf 甚至应用您的范围

您可以按原样更改 table 或查看 dbFiddle

Alter Table Accounts add [HierID] as convert(hierarchyid,'/'+replace(AccountNumber,'-','/')+'/')  PERSISTED;

注意:创建索引是可选的,但强烈建议。


现在,假设我想要 1110-1-1 Bank One1120 Receivables (including descendants) 之间的所有内容,查询将如下所示:

Declare @R1 varchar(50) = '1110-1-1'
Declare @R2 varchar(50) = '1120'

Select * 
  from Accounts
  Where HierID between convert(hierarchyid,'/'+replace(@R1,'-','/')+'/')
                   and convert(hierarchyid,'/'+replace(@R2+'-99999','-','/')+'/')

结果

现在,假设我想要 1110-1 US Banks 的后代,查询将如下所示:

 Declare @S varchar(50) = '1110-1'

 Select * 
  From Accounts
  Where HierID.IsDescendantOf( convert(hierarchyid,'/'+replace(@S,'-','/')+'/') ) = 1

结果

我会创建一个计算列和一个索引。

但要注意:由于 FORMAT 是不确定的,因此计算不应 FORMAT(..., 'D8')

-- FORMAT is non-deterministic, hence, not allowing INDEXes
-- Used RIGHT, which is deterministic

ALTER TABLE Accounts
ADD AccountNumberNormalized
AS
    RIGHT('00000000' + CONVERT(VARCHAR, Account),       8) + '-' +
    RIGHT('00000000' + CONVERT(VARCHAR, SubAcct),       8) + '-' +
    RIGHT('00000000' + CONVERT(VARCHAR, SubSubAcct),    8) + '-' +
    RIGHT('00000000' + CONVERT(VARCHAR, SubSubSubAcct), 8);

CREATE INDEX AK_Accounts_Normalized
ON Accounts(AccountNumberNormalized);

那么,查询就这么简单了

SELECT * FROM Accounts 
WHERE
   AccountNumberNormalized
       BETWEEN '00001110-00000001-00000001-00000002' 
       AND     '00001120-00000001-00000002-00000000'
ORDER BY Account,SubAcct,SubSubAcct,SubSubSubAcct

结果 fiddle 在这里:http://sqlfiddle.com/#!18/bc2b3/1

您描述的数据与 PeopleSoft 树 (https://docs.oracle.com/cd/E24150_01/pt851h2/eng/psbooks/ttrm/chapter.htm?File=ttrm/htm/ttrm03.htm) 非常相似,但是您的数据并未以真正的分层方式存储,这阻碍了有效访问。例如,您的 SubAcct 值失去意义,因为其中的相同值跨越多个 Account 列值。该值应与父级连接。这是因为它本身就是一个节点。除了 SubSubSubAcct 之外的所有内容都一样,您在同一节点中永远不会有超过一个,因此那里的值无关紧要。换句话说,您必须为每个节点设置唯一值。否则它就坏了,你就陷入了现在的困境。

也就是说,您的访问方式也必须改变。假设您永远不需要针对节点中的某些但不是所有叶子,您可以更改 where 子句条件,使其专注于节点(在转换数据之后)。

我怀疑您是否真的需要查询“任何范围的帐户”而不是某些范围的节点。换句话说,在银行 1 到 5 中,你想要银行 2 到 4 的所有账户,还是只想要银行 2 的部分账户,以及银行 3 和 4 的所有账户?我不确定我是否理解叶子的顺序对于这种部分抓取节点的重要性(您的要求表明可能存在,因为您需要帐户 between)。一些上下文会有帮助。

无论如何,我会在尝试查询之前转换这些数据。

关键要求:“我需要查询任意范围的帐户”,无论范围端点中的“帐号”是否实际存在。需要的第一段代码是一个可靠地解析范围端点组件的函数。在这种情况下,该函数依赖于一个名为 dbo 的序数拆分器。DelimitedSplit8K_LEAD(解释 here

DelimitedSplit8K_LEAD

CREATE FUNCTION [dbo].[DelimitedSplit8K_LEAD]
--===== Define I/O parameters
        (@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
  WITH E1(N) AS (
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
 cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
                     -- for both a performance gain and prevention of accidental "overruns"
                 SELECT 0 UNION ALL
                 SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
                ),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
                 SELECT t.N+1
                   FROM cteTally t
                  WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0) 
                )
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
 SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
        Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,8000))
   FROM cteStart s
;

拆分端点“账号”的功能

create function dbo.test_fnAccountParts(
    @acct            varchar(12))
returns table with schemabinding as return 
select max(case when dl.ItemNumber=1 then Item else 0 end) a,
       max(case when dl.ItemNumber=2 then Item else 0 end) sa,
       max(case when dl.ItemNumber=3 then Item else 0 end) ssa,
       max(case when dl.ItemNumber=4 then Item else 0 end) sssa
from dbo.DelimitedSplit8K_LEAD(@acct,'-') dl;

查询以根据序号定位行

declare
  @acct1            varchar(12)='1110-1-1-2',
  @acct2            varchar(12)='1120-1-2';

with
rn_cte as (
      select a.*, row_number() over (order by Account, SubAcct, SubSubAcct, SubSubSubAcct) rn
      from #accounts a),
rn1_cte as (select max(rn) max_rn 
            from rn_cte r
                 cross apply dbo.test_fnAccountParts(@acct1) ap
            where r.Account<=ap.a
                  and r.SubAcct<=ap.sa
                  and r.SubSubAcct<=ap.ssa
                  and r.SubSubSubAcct<=ap.sssa),
rn2_cte as (select max(rn) max_rn 
            from rn_cte r
                 cross apply dbo.test_fnAccountParts(@acct2) ap
            where r.Account<=ap.a
                  and r.SubAcct<=ap.sa
                  and r.SubSubAcct<=ap.ssa
                  and r.SubSubSubAcct<=ap.sssa)
select rn.*
from rn_cte rn
     cross join rn1_cte r1
     cross join rn2_cte r2
where rn.rn between r1.max_rn
                and r2.max_rn;
Account SubAcct SubSubAcct  SubSubSubAcct   AccountNumber   Name                rn
1110    1       1           2               1110-1-1-2  Bank One #234567        5
1110    1       1           11              1110-1-1-11 Bank One #11223344      6
1110    1       2           0               1110-1-2-0  Bank Two                7
1110    1       2           1               1110-1-2-1-1    Bank Two #876543    8
1110    2       0           0               1110-2      Foreign Banks           9
1110    2       1           0               1110-2-1    Japan One #556677       10
1120    0       0           0               1120        Receivables             11
1120    1       0           0               1120-1      US Receivables          12
1120    1       1           0               1120-1-1    Zone One                13
1120    1       1           1               1120-1-1-1  Customer AAA            14
1120    1       1           2               1120-1-1-2  Customer BBB            15
1120    1       1           3               1120-1-1-3  Customer CCC            16
1120    1       2           0               1120-1-2-0  Zone Two                17

假设您要添加一个名为 AccountNumberNormalized 的索引计算列(如 Marcus Vinicius Pompeu 的回答中所建议的那样)。这是个好建议。然后,您将需要一个函数来 return 端点的规范化帐号。像这样

drop function if exists dbo.test_fnAccountNumberNormalized;
go
create function dbo.test_fnAccountNumberNormalized(
    @acct            varchar(12))
returns table with schemabinding as return 
select concat_ws('-', RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=1 then Item else 0 end)) ), 8),
                      RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=2 then Item else 0 end)) ), 8),
                      RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=3 then Item else 0 end)) ), 8),
                      RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=4 then Item else 0 end)) ), 8)) 
                      AccountNumberNormalized
from dbo.DelimitedSplit8K_LEAD(@acct,'-') dl;

那么这个查询 return 的结果(13 行)与上述相同

declare
  @acct1            varchar(12)='1110-1-1-2',
  @acct2            varchar(12)='1120-1-2';

SELECT a.* 
FROM #Accounts a
     cross apply dbo.test_fnAccountNumberNormalized(@acct1) fn1
     cross apply dbo.test_fnAccountNumberNormalized(@acct2) fn2
WHERE
   a.AccountNumberNormalized
       BETWEEN fn1.AccountNumberNormalized
       AND     fn2.AccountNumberNormalized
ORDER BY Account,SubAcct,SubSubAcct,SubSubSubAcct;

这些是内联 table 值函数。如果您使用的是 SQL 2019(或兼容级别 150),您也许可以将它们更改为内联标量函数。

[编辑] 这是一个 returns CHAR(35) 的标量函数。它肯定会清理代码。性能方面,它取决于具体情况,需要进行测试。此查询 return 与上面的结果相同(13 行)。

create function dbo.test_scalar_fnAccountNumberNormalized(
    @acct            varchar(12))
returns char(35) as 
begin
return (
select concat_ws('-', RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=1 then Item else 0 end)) ), 8),
                      RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=2 then Item else 0 end)) ), 8),
                      RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=3 then Item else 0 end)) ), 8),
                      RIGHT('00000000' + CONVERT(VARCHAR, (max(case when dl.ItemNumber=4 then Item else 0 end)) ), 8)) 
                      AccountNumberNormalized
from dbo.DelimitedSplit8K_LEAD(@acct,'-') dl);
end
declare
  @acct1            varchar(12)='1110-1-1-2',
  @acct2            varchar(12)='1120-1-2';

SELECT a.* 
FROM #Accounts a
WHERE
   a.AccountNumberNormalized
       BETWEEN dbo.test_scalar_fnAccountNumberNormalized(@acct1) 
       AND     dbo.test_scalar_fnAccountNumberNormalized(@acct2) 
ORDER BY Account,SubAcct,SubSubAcct,SubSubSubAcct;

为什么不把它转化成一个大数来比较呢?比任何字符串计算都快。

SELECT 
  (Account * 1000 + SubAcct *100 + SubSubAcct*10 + SubSubSubAcct) as full_Account
FROM Accounts 
WHERE (Account * 1000 + SubAcct *100 + SubSubAcct*10 + SubSubSubAcct) 
       between 1110112 and 1120120
ORDER BY Account,SubAcct,SubSubAcct,SubSubSubAcct

Ints 比 char 更快,并且随着您的深入,有大量可供 ints 扩展的好函数。

只需确保没有冲突.. 帐户的最大长度、子行为等将是您放置在本机密钥后面的#0。

SQL - 无需花哨。快速、可扩展、易于记录。

SELECT  
*  
FROM 
 Accounts 
WHERE
  SubSubSubAcct  +
  SubSubAcct*  100000000 +
  SubAcct   *  10000000000000000 +
  Account   *  1000000000000000000000000   
BETWEEN  
  1110 *1000000000000000000000000 + --Account
  1    *10000000000000000 +         --SubAcct
  1    *100000000 +                 --SubSubAcct
  2                                 --SubSubSubAcct
and 
  1120 *1000000000000000000000000 + --Account
  1    *10000000000000000 +         --SubAcct
  2    *100000000 +                 --SubSubAcct
  0                                 --SubSubSubAcct

SQL FIDDLE

冗余数据是旧会计 table 定义的主要问题。例如,它有 SubAcct、SubSubAcct、SubSubSubAcct,也许还有 Sub...Acct 列。我相信这 table 不遵守规范化规则。

如果你想创建一个更好的table定义,那么我可以假设你定义了 3 列而不是 6 列,因为你可以管理更多的子账户而不是 3 个子账户。

CREATE TABLE [dbo].[Accounts](
    [AccountID] [int] NOT NULL,
    [ParentAccountID] [int] NULL,
    [Name] [VARCHAR](100) NOT NULL,
 CONSTRAINT [PK_Accounts] PRIMARY KEY CLUSTERED 
(
    [AccountID] ASC
),
  CONSTRAINT FK_ParentAccount FOREIGN KEY (ParentAccountID)
    REFERENCES Accounts(AccountID)
);

为了更好地维护递归关系,我更改了您的结构和值。

INSERT INTO Accounts
    ([AccountID], [ParentAccountID], [Name])
VALUES
    (1110,null, 'Banks'),
    (11101,1110, 'US Banks'),
    (111011,11101, 'Bank One'),
    (1110111,111011, 'Bank One #123456'),
    (1110112,111011, 'Bank One #234567'),
    (11101111,1110111 , 'Bank One #11223344'),
    (1110120, 1110112, 'Bank Two'),
    (1110121, 1110112, 'Bank Two #876543'),
    (11101211, 1110121, 'Bank Two #876543')
;

通过这个查询,你可以找到'Level'、'Path'、'Root'

此外,您可以通过 'between' 语法

过滤它
WITH    CTE_TreeAccounts
              AS ( SELECT   ParentAccountID ,
                            Name ,
                            Name AS FullPathName ,
                            CAST(AccountID AS VARCHAR(100)) AS FullPathID ,
                            0 AS lvl ,
                            AccountID,
                            AccountID AS rootid
                   FROM     Accounts
                   WHERE    ParentAccountID IS NULL
                   UNION ALL
                   SELECT   ac.ParentAccountID  ,
                            ac.Name AS name ,
                            CAST(CONCAT(ISNULL(actree.FullPathName, ''), ' / ',
                                        ac.Name) AS VARCHAR(100)) AS name ,
                            CAST(CONCAT(ISNULL(actree.FullPathID, ''), '-',
                                        ac.AccountID) AS VARCHAR(100)) AS name ,
                            actree.lvl + 1 ,
                            ac.AccountID,
                            actree.rootid 
                   FROM     Accounts AS ac
                            INNER JOIN CTE_TreeAccounts actree ON actree.AccountID = ac.ParentAccountID
                 )
Select * from CTE_TreeAccounts

这是一个 SQLFiddle 示例架构和数据

我会创建一个名为“AccountNumberRange”的新列,并完全按照您使用 FORMAT 所做的那样填写它。

update Accounts set AccountNumberRange = FORMAT(Account,'D8')+'-'+
      FORMAT(SubAcct,'D8')+'-'+
      FORMAT(SubSubAcct,'D8')+'-'+
      FORMAT(SubSubSubAcct,'D8');

之后我会为此专栏指定 default value,以保持更新。

这样做,您将为该列编制索引并加快查询速度。

最好的, 胡里奥