T-SQL ORDER BY 忽略“'-' + ...”但不忽略“'+' + ...”
T-SQL ORDER BY ignores " '-' + ... " but not " '+' + ... "
所以我最近在比较两个值时遇到了一个奇怪的错误。
我的值在 -1 到 2 之间。
有时认为-1大于0,解决起来很简单。显然是列设置为 varchar(50) 而不是 int。
但这让我想到了为什么会这样。因为即使该列设置为 varchar(50),“-”的字符值也应低于“0”(“-”的字符值为 45,“0”的字符值应为 48)
我做了一些测试,结果是,我能找到的是,'-' 是 ORDER BY 不关心的唯一字符。
示例:
SELECT
A.x
FROM
(
VALUES
('-5'), ('-4'), ('-3'), ('-2'), ('-1'),
('0'), ('1'), ('2'), ('3'), ('4'), ('5')
) A(x)
ORDER BY
A.x;
SELECT
B.x
FROM
(
VALUES
('+5'), ('+4'), ('+3'), ('+2'), ('+1'),
('0'), ('1'), ('2'), ('3'), ('4'), ('5')
) B(x)
ORDER BY
B.x
结果:
Result of A
0
1
-1
2
-2
3
-3
4
-4
5
-5
Result of B
+1
+2
+3
+4
+5
0
1
2
3
4
5
(+ 的字符值为 43)
“+”顺序感觉正确,但“-”似乎...错误
谁知道为什么会这样?
附加信息
服务器版本:12.0.4213
排序规则:Finnish_Swedish_CI_AS
不知道还有什么可以扭曲结果。询问您是否需要更多信息。
找到原因了。
TLDR:非 unicode 和 unicode 归类对“-”的排序不同。
"A SQL collation's rules for sorting non-Unicode data are incompatible
with any sort routine that is provided by the Microsoft Windows
operating system; however, the sorting of Unicode data is compatible
with a particular version of the Windows sorting rules. Because the
comparison rules for non-Unicode and Unicode data are different, when
you use a SQL collation you might see different results for
comparisons of the same characters, depending on the underlying data
type. For example, if you are using the SQL collation
"SQL_Latin1_General_CP1_CI_AS", the non-Unicode string 'a-c' is less
than the string 'ab' because the hyphen ("-") is sorted as a separate
character that comes before "b". However, if you convert these strings
to Unicode and you perform the same comparison, the Unicode string
N'a-c' is considered to be greater than N'ab' because the Unicode
sorting rules use a "word sort" that ignores the hyphen."
所以我最近在比较两个值时遇到了一个奇怪的错误。
我的值在 -1 到 2 之间。 有时认为-1大于0,解决起来很简单。显然是列设置为 varchar(50) 而不是 int。
但这让我想到了为什么会这样。因为即使该列设置为 varchar(50),“-”的字符值也应低于“0”(“-”的字符值为 45,“0”的字符值应为 48)
我做了一些测试,结果是,我能找到的是,'-' 是 ORDER BY 不关心的唯一字符。
示例:
SELECT
A.x
FROM
(
VALUES
('-5'), ('-4'), ('-3'), ('-2'), ('-1'),
('0'), ('1'), ('2'), ('3'), ('4'), ('5')
) A(x)
ORDER BY
A.x;
SELECT
B.x
FROM
(
VALUES
('+5'), ('+4'), ('+3'), ('+2'), ('+1'),
('0'), ('1'), ('2'), ('3'), ('4'), ('5')
) B(x)
ORDER BY
B.x
结果:
Result of A
0
1
-1
2
-2
3
-3
4
-4
5
-5
Result of B
+1
+2
+3
+4
+5
0
1
2
3
4
5
(+ 的字符值为 43)
“+”顺序感觉正确,但“-”似乎...错误
谁知道为什么会这样?
附加信息
服务器版本:12.0.4213
排序规则:Finnish_Swedish_CI_AS
不知道还有什么可以扭曲结果。询问您是否需要更多信息。
找到原因了。
TLDR:非 unicode 和 unicode 归类对“-”的排序不同。
"A SQL collation's rules for sorting non-Unicode data are incompatible with any sort routine that is provided by the Microsoft Windows operating system; however, the sorting of Unicode data is compatible with a particular version of the Windows sorting rules. Because the comparison rules for non-Unicode and Unicode data are different, when you use a SQL collation you might see different results for comparisons of the same characters, depending on the underlying data type. For example, if you are using the SQL collation "SQL_Latin1_General_CP1_CI_AS", the non-Unicode string 'a-c' is less than the string 'ab' because the hyphen ("-") is sorted as a separate character that comes before "b". However, if you convert these strings to Unicode and you perform the same comparison, the Unicode string N'a-c' is considered to be greater than N'ab' because the Unicode sorting rules use a "word sort" that ignores the hyphen."