为什么 DelimitedSplit8k udf 在 SQL 服务器中使用 2X(笛卡尔积)编写?

Why was the DelimitedSplit8k udf was written with 2X (cartesian product) in SQL server?

我问 关于在 sql 服务器中编写快速内联 table 值函数。

答案中的代码有效,但我问的是那部分:

我很清楚他想创建许多数字(1,1,1,1,1,...),然后将它们变成连续数字(1,2,3,4,5, 6....):

这部分:

WITH E1(N) AS (
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
,E2(N) AS (SELECT 1 FROM E1 a, E1 b)
,E4(N) AS (SELECT 1 FROM E2 a, E2 b)
SELECT * FROM e4 --10000 rows

他创建了 10000 行。

此函数被广泛使用,因此我的问题是:

问题:

他 (Jeff Moden) 为什么不使用:

WITH E1(N) AS (
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
,E2(N) AS (SELECT 1 FROM E1 a, E1 b , E1 c , E1 d)

SELECT * FROM E2 -- ALSO 10000 rows !!!

而是选择拆分成E2,E4

虽然我不是 Jeff Moden 也不知道他的推理,但我发现他很可能只是使用了一种已知的数字生成模式,他自己在 this Stack Overflow answer 中称之为 Itzik Ben Gan 的交叉连接 CTE 方法。

模式是这样的:

WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
     E02(N) AS (SELECT 1 FROM E00 a, E00 b),
     E04(N) AS (SELECT 1 FROM E02 a, E02 b),
     E08(N) AS (SELECT 1 FROM E04 a, E04 b),
     ...

为了适应他的字符串拆分功能的方法,他显然发现将初始 CTE 修改为十行而不是两行并将交叉连接 CTE 的数量减少到两个以仅覆盖更方便他的解决方案所需的 8000 行。

嘿...刚刚 运行 看完了,我想我会回答。

Andriy M 回答得非常正确。它在很大程度上模仿了 Itzik Ben-Gan 的原始 BASE 2 代码,是的,我将它(和许多其他代码一样)更改为 Base 10 代码,只是为了减少 cCTE(级联 CTE)的数量。我和许多其他人使用的最新代码进一步减少了 cCTE 的数量。它还使用 VALUES 运算符来减少代码量,尽管这样做没有性能优势。

   WITH  E1(N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))E0(N)) --10 rows
        ,E4(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c, E1 d)
 SELECT * FROM e4 --10000 rows
;

还有很多其他地方需要这种动态创建序列。有些需要从 0 开始序列,有些需要从 1 开始序列。还需要更大的 运行ge 值,老实说,我厌倦了仔细编写类似于上面的代码,所以我做了 Mr . Ben-Gan 和许多其他人都做过。我写了一个名为 "fnTally" 的 iTVF。我通常不对函数使用匈牙利表示法,但我使用 "fn" 前缀有两个原因。 1) 是因为我仍然保持物理 Tally Table,因此需要以不同的方式命名该功能,并且 2) 我可以告诉工作人员 "If you had used the 'eff-n' Tally function I told you about, you wouldn't have this problem" 而实际上这不是 HR 违规行为。 ;-)

以防万一有人需要这样的东西,这是我为我的 fnTally 函数版本编写的代码。允许它从 0 或 1 的性能开始有一点折衷,但无论如何对我来说,额外的灵活性是值得的。而且,是的......您可以通过在第二个也是最后一个 cCTE 中执行 12 个 CROSS JOIN 来减少其中的 cCTE 数量。我只是没有走那条路。你可以没有伤害。

另请注意,我仍然使用 SELECT/UNION ALL 方法来形成前 10 个伪行,因为我仍然在 2005 年与其他人一起做很多工作,直到大约 6 个月前我自己一直在使用 2005 .完整的文档包含在代码中。

 CREATE FUNCTION [dbo].[fnTally]
/**********************************************************************************************************************
 Purpose:
 Return a column of BIGINTs from @ZeroOrOne up to and including @MaxN with a max value of 1 Trillion.

 As a performance note, it takes about 00:02:10 (hh:mm:ss) to generate 1 Billion numbers to a throw-away variable.

 Usage:
--===== Syntax example (Returns BIGINT)
 SELECT t.N
   FROM dbo.fnTally(@ZeroOrOne,@MaxN) t
;

 Notes:
 1. Based on Itzik Ben-Gan's cascading CTE (cCTE) method for creating a "readless" Tally Table source of BIGINTs.
    Refer to the following URLs for how it works and introduction for how it replaces certain loops. 
    http://www.sqlservercentral.com/articles/T-SQL/62867/
    http://sqlmag.com/sql-server/virtual-auxiliary-table-numbers
 2. To start a sequence at 0, @ZeroOrOne must be 0 or NULL. Any other value that's convertable to the BIT data-type
    will cause the sequence to start at 1.
 3. If @ZeroOrOne = 1 and @MaxN = 0, no rows will be returned.
 5. If @MaxN is negative or NULL, a "TOP" error will be returned.
 6. @MaxN must be a positive number from >= the value of @ZeroOrOne up to and including 1 Billion. If a larger
    number is used, the function will silently truncate after 1 Billion. If you actually need a sequence with
    that many values, you should consider using a different tool. ;-)
 7. There will be a substantial reduction in performance if "N" is sorted in descending order.  If a descending 
    sort is required, use code similar to the following. Performance will decrease by about 27% but it's still
    very fast especially compared with just doing a simple descending sort on "N", which is about 20 times slower.
    If @ZeroOrOne is a 0, in this case, remove the "+1" from the code.

    DECLARE @MaxN BIGINT; 
     SELECT @MaxN = 1000;
     SELECT DescendingN = @MaxN-N+1 
       FROM dbo.fnTally(1,@MaxN);

 8. There is no performance penalty for sorting "N" in ascending order because the output is explicity sorted by
    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

 Revision History:
 Rev 00 - Unknown     - Jeff Moden 
        - Initial creation with error handling for @MaxN.
 Rev 01 - 09 Feb 2013 - Jeff Moden 
        - Modified to start at 0 or 1.
 Rev 02 - 16 May 2013 - Jeff Moden 
        - Removed error handling for @MaxN because of exceptional cases.
 Rev 03 - 22 Apr 2015 - Jeff Moden
        - Modify to handle 1 Trillion rows for experimental purposes.
**********************************************************************************************************************/
        (@ZeroOrOne BIT, @MaxN BIGINT)
RETURNS TABLE WITH SCHEMABINDING AS 
 RETURN WITH
  E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1)                                  --10E1 or 10 rows
, E4(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c, E1 d)      --10E4 or 10 Thousand rows
,E12(N) AS (SELECT 1 FROM E4 a, E4 b, E4 c)            --10E12 or 1 Trillion rows                 
            SELECT N = 0 WHERE ISNULL(@ZeroOrOne,0)= 0 --Conditionally start at 0.
             UNION ALL 
            SELECT TOP(@MaxN) N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E12 -- Values from 1 to @MaxN
;