STRING_AGG 未按预期运行
STRING_AGG not behaving as expected
我有以下查询:
WITH cteCountryLanguageMapping AS (
SELECT * FROM (
VALUES
('Spain', 'English'),
('Spain', 'Spanish'),
('Sweden', 'English'),
('Switzerland', 'English'),
('Switzerland', 'French'),
('Switzerland', 'German'),
('Switzerland', 'Italian')
) x ([Country], [Language])
)
SELECT
[Country],
CASE COUNT([Language])
WHEN 1 THEN MAX([Language])
WHEN 2 THEN STRING_AGG([Language], ' and ')
ELSE STRING_AGG([Language], ', ')
END AS [Languages],
COUNT([Language]) AS [LanguageCount]
FROM cteCountryLanguageMapping
GROUP BY [Country]
我希望瑞士的“语言”列中的值以逗号分隔,即:
| Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French, German, Italian, English | 4
相反,我得到以下输出(4 个值由 and
分隔):
| Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French and German and Italian and English | 4
我错过了什么?
这是另一个例子:
SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG(z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y
| y | STRING_AGG_PLUS | STRING_AGG_MINUS
--+---+-----------------+-----------------
1 | 1 | a+b | a+b
这是 SQL 服务器中的错误吗?
是的,这是一个错误 (tm),存在于 SQL Server 2017 的所有版本中(截至撰写时)。它已在 Azure SQL 服务器和 2019 RC1 中修复。具体来说,优化器中执行公共子表达式消除的部分(确保我们不会计算超出必要的表达式)不正确地认为只要 x
匹配, STRING_AGG(x, <separator>)
形式的所有表达式都是相同的,无论<separator>
是什么,并将它们与查询中的第一个计算表达式统一起来。
一种解决方法是通过对其执行某种(近似)身份转换来确保 x
不匹配。由于我们处理的是字符串,因此连接一个空字符串可以:
SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG('' + z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y
我有以下查询:
WITH cteCountryLanguageMapping AS (
SELECT * FROM (
VALUES
('Spain', 'English'),
('Spain', 'Spanish'),
('Sweden', 'English'),
('Switzerland', 'English'),
('Switzerland', 'French'),
('Switzerland', 'German'),
('Switzerland', 'Italian')
) x ([Country], [Language])
)
SELECT
[Country],
CASE COUNT([Language])
WHEN 1 THEN MAX([Language])
WHEN 2 THEN STRING_AGG([Language], ' and ')
ELSE STRING_AGG([Language], ', ')
END AS [Languages],
COUNT([Language]) AS [LanguageCount]
FROM cteCountryLanguageMapping
GROUP BY [Country]
我希望瑞士的“语言”列中的值以逗号分隔,即:
| Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French, German, Italian, English | 4
相反,我得到以下输出(4 个值由 and
分隔):
| Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French and German and Italian and English | 4
我错过了什么?
这是另一个例子:
SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG(z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y
| y | STRING_AGG_PLUS | STRING_AGG_MINUS
--+---+-----------------+-----------------
1 | 1 | a+b | a+b
这是 SQL 服务器中的错误吗?
是的,这是一个错误 (tm),存在于 SQL Server 2017 的所有版本中(截至撰写时)。它已在 Azure SQL 服务器和 2019 RC1 中修复。具体来说,优化器中执行公共子表达式消除的部分(确保我们不会计算超出必要的表达式)不正确地认为只要 x
匹配, STRING_AGG(x, <separator>)
形式的所有表达式都是相同的,无论<separator>
是什么,并将它们与查询中的第一个计算表达式统一起来。
一种解决方法是通过对其执行某种(近似)身份转换来确保 x
不匹配。由于我们处理的是字符串,因此连接一个空字符串可以:
SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG('' + z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y