T-SQL ORDER BY 忽略“'-' + ...”但不忽略“'+' + ...”

T-SQL ORDER BY ignores " '-' + ... " but not " '+' + ... "

所以我最近在比较两个值时遇到了一个奇怪的错误。

我的值在 -1 到 2 之间。 有时认为-1大于0,解决起来很简单。显然是列设置为 varchar(50) 而不是 int。

但这让我想到了为什么会这样。因为即使该列设置为 varchar(50),“-”的字符值也应低于“0”(“-”的字符值为 45,“0”的字符值应为 48)

我做了一些测试,结果是,我能找到的是,'-' 是 ORDER BY 不关心的唯一字符。

示例:

SELECT
    A.x
FROM
    (
        VALUES
            ('-5'), ('-4'), ('-3'), ('-2'), ('-1'),
            ('0'), ('1'), ('2'), ('3'), ('4'), ('5')
    ) A(x)
ORDER BY
    A.x;

SELECT
    B.x
FROM
    (
        VALUES
            ('+5'), ('+4'), ('+3'), ('+2'), ('+1'),
            ('0'), ('1'), ('2'), ('3'), ('4'), ('5')
    ) B(x)
ORDER BY
    B.x

结果:

Result of A
0
1
-1
2
-2
3
-3
4
-4
5
-5

Result of B
+1
+2
+3
+4
+5
0
1
2
3
4
5

(+ 的字符值为 43)

“+”顺序感觉正确,但“-”似乎...错误
谁知道为什么会这样?

附加信息

服务器版本:12.0.4213
排序规则:Finnish_Swedish_CI_AS

不知道还有什么可以扭曲结果。询问您是否需要更多信息。

找到原因了。

TLDR:非 unicode 和 unicode 归类对“-”的排序不同。

"A SQL collation's rules for sorting non-Unicode data are incompatible with any sort routine that is provided by the Microsoft Windows operating system; however, the sorting of Unicode data is compatible with a particular version of the Windows sorting rules. Because the comparison rules for non-Unicode and Unicode data are different, when you use a SQL collation you might see different results for comparisons of the same characters, depending on the underlying data type. For example, if you are using the SQL collation "SQL_Latin1_General_CP1_CI_AS", the non-Unicode string 'a-c' is less than the string 'ab' because the hyphen ("-") is sorted as a separate character that comes before "b". However, if you convert these strings to Unicode and you perform the same comparison, the Unicode string N'a-c' is considered to be greater than N'ab' because the Unicode sorting rules use a "word sort" that ignores the hyphen."

来源:https://support.microsoft.com/en-us/kb/322112