为什么 DelimitedSplit8k udf 在 SQL 服务器中使用 2X(笛卡尔积)编写?
Why was the DelimitedSplit8k udf was written with 2X (cartesian product) in SQL server?
我问 关于在 sql 服务器中编写快速内联 table 值函数。
答案中的代码有效,但我问的是那部分:
我很清楚他想创建许多数字(1,1,1,1,1,...),然后将它们变成连续数字(1,2,3,4,5, 6....):
这部分:
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
,E2(N) AS (SELECT 1 FROM E1 a, E1 b)
,E4(N) AS (SELECT 1 FROM E2 a, E2 b)
SELECT * FROM e4 --10000 rows
他创建了 10000 行。
此函数被广泛使用,因此我的问题是:
问题:
他 (Jeff Moden) 为什么不使用:
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
,E2(N) AS (SELECT 1 FROM E1 a, E1 b , E1 c , E1 d)
SELECT * FROM E2 -- ALSO 10000 rows !!!
而是选择拆分成E2
,E4
?
虽然我不是 Jeff Moden 也不知道他的推理,但我发现他很可能只是使用了一种已知的数字生成模式,他自己在 this Stack Overflow answer 中称之为 Itzik Ben Gan 的交叉连接 CTE 方法。
模式是这样的:
WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
E02(N) AS (SELECT 1 FROM E00 a, E00 b),
E04(N) AS (SELECT 1 FROM E02 a, E02 b),
E08(N) AS (SELECT 1 FROM E04 a, E04 b),
...
为了适应他的字符串拆分功能的方法,他显然发现将初始 CTE 修改为十行而不是两行并将交叉连接 CTE 的数量减少到两个以仅覆盖更方便他的解决方案所需的 8000 行。
嘿...刚刚 运行 看完了,我想我会回答。
Andriy M 回答得非常正确。它在很大程度上模仿了 Itzik Ben-Gan 的原始 BASE 2 代码,是的,我将它(和许多其他代码一样)更改为 Base 10 代码,只是为了减少 cCTE(级联 CTE)的数量。我和许多其他人使用的最新代码进一步减少了 cCTE 的数量。它还使用 VALUES 运算符来减少代码量,尽管这样做没有性能优势。
WITH E1(N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))E0(N)) --10 rows
,E4(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c, E1 d)
SELECT * FROM e4 --10000 rows
;
还有很多其他地方需要这种动态创建序列。有些需要从 0 开始序列,有些需要从 1 开始序列。还需要更大的 运行ge 值,老实说,我厌倦了仔细编写类似于上面的代码,所以我做了 Mr . Ben-Gan 和许多其他人都做过。我写了一个名为 "fnTally" 的 iTVF。我通常不对函数使用匈牙利表示法,但我使用 "fn" 前缀有两个原因。 1) 是因为我仍然保持物理 Tally Table,因此需要以不同的方式命名该功能,并且 2) 我可以告诉工作人员 "If you had used the 'eff-n' Tally function I told you about, you wouldn't have this problem" 而实际上这不是 HR 违规行为。 ;-)
以防万一有人需要这样的东西,这是我为我的 fnTally 函数版本编写的代码。允许它从 0 或 1 的性能开始有一点折衷,但无论如何对我来说,额外的灵活性是值得的。而且,是的......您可以通过在第二个也是最后一个 cCTE 中执行 12 个 CROSS JOIN 来减少其中的 cCTE 数量。我只是没有走那条路。你可以没有伤害。
另请注意,我仍然使用 SELECT/UNION ALL 方法来形成前 10 个伪行,因为我仍然在 2005 年与其他人一起做很多工作,直到大约 6 个月前我自己一直在使用 2005 .完整的文档包含在代码中。
CREATE FUNCTION [dbo].[fnTally]
/**********************************************************************************************************************
Purpose:
Return a column of BIGINTs from @ZeroOrOne up to and including @MaxN with a max value of 1 Trillion.
As a performance note, it takes about 00:02:10 (hh:mm:ss) to generate 1 Billion numbers to a throw-away variable.
Usage:
--===== Syntax example (Returns BIGINT)
SELECT t.N
FROM dbo.fnTally(@ZeroOrOne,@MaxN) t
;
Notes:
1. Based on Itzik Ben-Gan's cascading CTE (cCTE) method for creating a "readless" Tally Table source of BIGINTs.
Refer to the following URLs for how it works and introduction for how it replaces certain loops.
http://www.sqlservercentral.com/articles/T-SQL/62867/
http://sqlmag.com/sql-server/virtual-auxiliary-table-numbers
2. To start a sequence at 0, @ZeroOrOne must be 0 or NULL. Any other value that's convertable to the BIT data-type
will cause the sequence to start at 1.
3. If @ZeroOrOne = 1 and @MaxN = 0, no rows will be returned.
5. If @MaxN is negative or NULL, a "TOP" error will be returned.
6. @MaxN must be a positive number from >= the value of @ZeroOrOne up to and including 1 Billion. If a larger
number is used, the function will silently truncate after 1 Billion. If you actually need a sequence with
that many values, you should consider using a different tool. ;-)
7. There will be a substantial reduction in performance if "N" is sorted in descending order. If a descending
sort is required, use code similar to the following. Performance will decrease by about 27% but it's still
very fast especially compared with just doing a simple descending sort on "N", which is about 20 times slower.
If @ZeroOrOne is a 0, in this case, remove the "+1" from the code.
DECLARE @MaxN BIGINT;
SELECT @MaxN = 1000;
SELECT DescendingN = @MaxN-N+1
FROM dbo.fnTally(1,@MaxN);
8. There is no performance penalty for sorting "N" in ascending order because the output is explicity sorted by
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
Revision History:
Rev 00 - Unknown - Jeff Moden
- Initial creation with error handling for @MaxN.
Rev 01 - 09 Feb 2013 - Jeff Moden
- Modified to start at 0 or 1.
Rev 02 - 16 May 2013 - Jeff Moden
- Removed error handling for @MaxN because of exceptional cases.
Rev 03 - 22 Apr 2015 - Jeff Moden
- Modify to handle 1 Trillion rows for experimental purposes.
**********************************************************************************************************************/
(@ZeroOrOne BIT, @MaxN BIGINT)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN WITH
E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1) --10E1 or 10 rows
, E4(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c, E1 d) --10E4 or 10 Thousand rows
,E12(N) AS (SELECT 1 FROM E4 a, E4 b, E4 c) --10E12 or 1 Trillion rows
SELECT N = 0 WHERE ISNULL(@ZeroOrOne,0)= 0 --Conditionally start at 0.
UNION ALL
SELECT TOP(@MaxN) N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E12 -- Values from 1 to @MaxN
;
我问
答案中的代码有效,但我问的是那部分:
我很清楚他想创建许多数字(1,1,1,1,1,...),然后将它们变成连续数字(1,2,3,4,5, 6....):
这部分:
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
,E2(N) AS (SELECT 1 FROM E1 a, E1 b)
,E4(N) AS (SELECT 1 FROM E2 a, E2 b)
SELECT * FROM e4 --10000 rows
他创建了 10000 行。
此函数被广泛使用,因此我的问题是:
问题:
他 (Jeff Moden) 为什么不使用:
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
,E2(N) AS (SELECT 1 FROM E1 a, E1 b , E1 c , E1 d)
SELECT * FROM E2 -- ALSO 10000 rows !!!
而是选择拆分成E2
,E4
?
虽然我不是 Jeff Moden 也不知道他的推理,但我发现他很可能只是使用了一种已知的数字生成模式,他自己在 this Stack Overflow answer 中称之为 Itzik Ben Gan 的交叉连接 CTE 方法。
模式是这样的:
WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
E02(N) AS (SELECT 1 FROM E00 a, E00 b),
E04(N) AS (SELECT 1 FROM E02 a, E02 b),
E08(N) AS (SELECT 1 FROM E04 a, E04 b),
...
为了适应他的字符串拆分功能的方法,他显然发现将初始 CTE 修改为十行而不是两行并将交叉连接 CTE 的数量减少到两个以仅覆盖更方便他的解决方案所需的 8000 行。
嘿...刚刚 运行 看完了,我想我会回答。
Andriy M 回答得非常正确。它在很大程度上模仿了 Itzik Ben-Gan 的原始 BASE 2 代码,是的,我将它(和许多其他代码一样)更改为 Base 10 代码,只是为了减少 cCTE(级联 CTE)的数量。我和许多其他人使用的最新代码进一步减少了 cCTE 的数量。它还使用 VALUES 运算符来减少代码量,尽管这样做没有性能优势。
WITH E1(N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))E0(N)) --10 rows
,E4(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c, E1 d)
SELECT * FROM e4 --10000 rows
;
还有很多其他地方需要这种动态创建序列。有些需要从 0 开始序列,有些需要从 1 开始序列。还需要更大的 运行ge 值,老实说,我厌倦了仔细编写类似于上面的代码,所以我做了 Mr . Ben-Gan 和许多其他人都做过。我写了一个名为 "fnTally" 的 iTVF。我通常不对函数使用匈牙利表示法,但我使用 "fn" 前缀有两个原因。 1) 是因为我仍然保持物理 Tally Table,因此需要以不同的方式命名该功能,并且 2) 我可以告诉工作人员 "If you had used the 'eff-n' Tally function I told you about, you wouldn't have this problem" 而实际上这不是 HR 违规行为。 ;-)
以防万一有人需要这样的东西,这是我为我的 fnTally 函数版本编写的代码。允许它从 0 或 1 的性能开始有一点折衷,但无论如何对我来说,额外的灵活性是值得的。而且,是的......您可以通过在第二个也是最后一个 cCTE 中执行 12 个 CROSS JOIN 来减少其中的 cCTE 数量。我只是没有走那条路。你可以没有伤害。
另请注意,我仍然使用 SELECT/UNION ALL 方法来形成前 10 个伪行,因为我仍然在 2005 年与其他人一起做很多工作,直到大约 6 个月前我自己一直在使用 2005 .完整的文档包含在代码中。
CREATE FUNCTION [dbo].[fnTally]
/**********************************************************************************************************************
Purpose:
Return a column of BIGINTs from @ZeroOrOne up to and including @MaxN with a max value of 1 Trillion.
As a performance note, it takes about 00:02:10 (hh:mm:ss) to generate 1 Billion numbers to a throw-away variable.
Usage:
--===== Syntax example (Returns BIGINT)
SELECT t.N
FROM dbo.fnTally(@ZeroOrOne,@MaxN) t
;
Notes:
1. Based on Itzik Ben-Gan's cascading CTE (cCTE) method for creating a "readless" Tally Table source of BIGINTs.
Refer to the following URLs for how it works and introduction for how it replaces certain loops.
http://www.sqlservercentral.com/articles/T-SQL/62867/
http://sqlmag.com/sql-server/virtual-auxiliary-table-numbers
2. To start a sequence at 0, @ZeroOrOne must be 0 or NULL. Any other value that's convertable to the BIT data-type
will cause the sequence to start at 1.
3. If @ZeroOrOne = 1 and @MaxN = 0, no rows will be returned.
5. If @MaxN is negative or NULL, a "TOP" error will be returned.
6. @MaxN must be a positive number from >= the value of @ZeroOrOne up to and including 1 Billion. If a larger
number is used, the function will silently truncate after 1 Billion. If you actually need a sequence with
that many values, you should consider using a different tool. ;-)
7. There will be a substantial reduction in performance if "N" is sorted in descending order. If a descending
sort is required, use code similar to the following. Performance will decrease by about 27% but it's still
very fast especially compared with just doing a simple descending sort on "N", which is about 20 times slower.
If @ZeroOrOne is a 0, in this case, remove the "+1" from the code.
DECLARE @MaxN BIGINT;
SELECT @MaxN = 1000;
SELECT DescendingN = @MaxN-N+1
FROM dbo.fnTally(1,@MaxN);
8. There is no performance penalty for sorting "N" in ascending order because the output is explicity sorted by
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
Revision History:
Rev 00 - Unknown - Jeff Moden
- Initial creation with error handling for @MaxN.
Rev 01 - 09 Feb 2013 - Jeff Moden
- Modified to start at 0 or 1.
Rev 02 - 16 May 2013 - Jeff Moden
- Removed error handling for @MaxN because of exceptional cases.
Rev 03 - 22 Apr 2015 - Jeff Moden
- Modify to handle 1 Trillion rows for experimental purposes.
**********************************************************************************************************************/
(@ZeroOrOne BIT, @MaxN BIGINT)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN WITH
E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1) --10E1 or 10 rows
, E4(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c, E1 d) --10E4 or 10 Thousand rows
,E12(N) AS (SELECT 1 FROM E4 a, E4 b, E4 c) --10E12 or 1 Trillion rows
SELECT N = 0 WHERE ISNULL(@ZeroOrOne,0)= 0 --Conditionally start at 0.
UNION ALL
SELECT TOP(@MaxN) N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E12 -- Values from 1 to @MaxN
;