CHARINDEX returns 搜索 thorn(字符 254)时某些 COLLATION 中的错误结果
CHARINDEX returns the wrong result in some COLLATIONs when searching for thorn (character 254)
概览
CHARINDEX
在使用像这样的归类序列时偶尔会返回错误的值:
Latin1_General_CI_AS
但可以使用如下排序顺序:
SQL_Latin1_General_CP1_CI_AS
这在 MS SQL Server 2008 R2 和 SQL Server 2016 上遇到过。
例子
假设数据库整理顺序为:
Latin1_General_CI_AS
print CHARINDEX( CHAR(254), 'Tþ' )
-- returns 2 是正确的
print CHARINDEX( CHAR(254), 'Th' )
-- returns 1 错误
print CHARINDEX( CHAR(253), 'Th' )
-- returns 0 是正确的
print CHARINDEX( CHAR(254) Collate SQL_Latin1_General_CP1_CI_AS, 'Thþ' Collate SQL_Latin1_General_CP1_CI_AS)
-- returns 3 是正确的
print CHARINDEX( CHAR(254) Collate Latin1_General_CI_AS, 'Thþ' Collate Latin1_General_CI_AS)
-- returns 1 错误
Latin1...
归类序列是否存在已知错误?
这将 return 正确的结果:
select CHARINDEX( NCHAR(254) Collate Latin1_General_BIN2, N'Tþ' Collate Latin1_General_BIN2)
select CHARINDEX( NCHAR(254) Collate Latin1_General_BIN2, N'Th' Collate Latin1_General_BIN2 )
select CHARINDEX( NCHAR(253) Collate Latin1_General_BIN2, N'Th' Collate Latin1_General_BIN2 )
select CHARINDEX( NCHAR(254) Collate Latin1_General_BIN2, N'Thþ' Collate Latin1_General_BIN2)
文档说:
Using Binary Collations
The following considerations will help you to decide whether old or new binary collations are appropriate for your Microsoft SQL Server
implementation. Support for both BIN and BIN2 collations will continue
in future SQL Server releases.
Binary collations sort data based on the sequence of coded values defined in a particular code page. A binary collation in SQL Server
defines the language locale and the ANSI code page to be used,
enforcing a binary sort order. Binary collations are useful in
achieving improved application performance due to their relative
simplicity.
Previous binary collations in SQL Server performed an incomplete code-point-to-code-point comparison for Unicode data, in that older
SQL Server binary collations compared the first character as WCHAR,
followed by a byte-by-byte comparison. For backward compatibility
reasons, existing binary collation semantics will not be changed.
Guidelines for Using Binary Collations
If your Microsoft SQL Server 2005 applications interact with older versions of SQL Server that use binary collations, continue to use
binary. Binary collations may be a more suitable choice for mixed
environments.
Guidelines for Using BIN2 Collations
Binary collations in this release of SQL Server include a new set of pure code-point comparison collations. Customers can choose to
migrate to the new binary collations to take advantage of true
code-point comparisons, and they should utilize the new binary
collations for development of new applications. The new BIN2 suffix
identifies collation names that implement the new code-point collation
semantics. In addition, a new comparison flag is added corresponding
to BIN2 for the new binary sort. Advantages include simpler
application development and clearer semantics.
即。 BIN2
归类等同于在 C#
中使用 Ordinal
wrt to charindex
.
这不是 SQL 服务器特有的内容。
在 C# 中
string.Compare("þ", "th", false, new System.Globalization.CultureInfo(1033))
returns 0
表示字符串比较相等。
或者在记事本中点击下面的"Replace all"
导致
在 SQL 中,不以 "SQL" 开头的服务器排序规则使用 windows 排序规则。
对于大多数地区(冰岛除外),thorn character þ
扩展为 th
.
在 Michael S. Kaplan Every rose has it's Þ..... That blog has a wealth of information about windows collations. Expansions are described in more detail here 的 post 中有关于此特定案例的更多信息。
如果您不想要这些语义,则需要使用具有您确实想要的语义的排序规则(可能是 SQL 或二进制)(可能通过明确的 collate
子句)。
概览
CHARINDEX
在使用像这样的归类序列时偶尔会返回错误的值:
Latin1_General_CI_AS
但可以使用如下排序顺序:
SQL_Latin1_General_CP1_CI_AS
这在 MS SQL Server 2008 R2 和 SQL Server 2016 上遇到过。
例子
假设数据库整理顺序为:
Latin1_General_CI_AS
print CHARINDEX( CHAR(254), 'Tþ' )
-- returns 2 是正确的print CHARINDEX( CHAR(254), 'Th' )
-- returns 1 错误print CHARINDEX( CHAR(253), 'Th' )
-- returns 0 是正确的print CHARINDEX( CHAR(254) Collate SQL_Latin1_General_CP1_CI_AS, 'Thþ' Collate SQL_Latin1_General_CP1_CI_AS)
-- returns 3 是正确的print CHARINDEX( CHAR(254) Collate Latin1_General_CI_AS, 'Thþ' Collate Latin1_General_CI_AS)
-- returns 1 错误
Latin1...
归类序列是否存在已知错误?
这将 return 正确的结果:
select CHARINDEX( NCHAR(254) Collate Latin1_General_BIN2, N'Tþ' Collate Latin1_General_BIN2)
select CHARINDEX( NCHAR(254) Collate Latin1_General_BIN2, N'Th' Collate Latin1_General_BIN2 )
select CHARINDEX( NCHAR(253) Collate Latin1_General_BIN2, N'Th' Collate Latin1_General_BIN2 )
select CHARINDEX( NCHAR(254) Collate Latin1_General_BIN2, N'Thþ' Collate Latin1_General_BIN2)
文档说:
Using Binary Collations
The following considerations will help you to decide whether old or new binary collations are appropriate for your Microsoft SQL Server implementation. Support for both BIN and BIN2 collations will continue in future SQL Server releases.
Binary collations sort data based on the sequence of coded values defined in a particular code page. A binary collation in SQL Server defines the language locale and the ANSI code page to be used, enforcing a binary sort order. Binary collations are useful in achieving improved application performance due to their relative simplicity.
Previous binary collations in SQL Server performed an incomplete code-point-to-code-point comparison for Unicode data, in that older SQL Server binary collations compared the first character as WCHAR, followed by a byte-by-byte comparison. For backward compatibility reasons, existing binary collation semantics will not be changed.
Guidelines for Using Binary Collations
If your Microsoft SQL Server 2005 applications interact with older versions of SQL Server that use binary collations, continue to use binary. Binary collations may be a more suitable choice for mixed environments.
Guidelines for Using BIN2 Collations
Binary collations in this release of SQL Server include a new set of pure code-point comparison collations. Customers can choose to migrate to the new binary collations to take advantage of true code-point comparisons, and they should utilize the new binary collations for development of new applications. The new BIN2 suffix identifies collation names that implement the new code-point collation semantics. In addition, a new comparison flag is added corresponding to BIN2 for the new binary sort. Advantages include simpler application development and clearer semantics.
即。 BIN2
归类等同于在 C#
中使用 Ordinal
wrt to charindex
.
这不是 SQL 服务器特有的内容。
在 C# 中
string.Compare("þ", "th", false, new System.Globalization.CultureInfo(1033))
returns 0
表示字符串比较相等。
或者在记事本中点击下面的"Replace all"
导致
在 SQL 中,不以 "SQL" 开头的服务器排序规则使用 windows 排序规则。
对于大多数地区(冰岛除外),thorn character þ
扩展为 th
.
在 Michael S. Kaplan Every rose has it's Þ..... That blog has a wealth of information about windows collations. Expansions are described in more detail here 的 post 中有关于此特定案例的更多信息。
如果您不想要这些语义,则需要使用具有您确实想要的语义的排序规则(可能是 SQL 或二进制)(可能通过明确的 collate
子句)。