CHARINDEX returns 搜索 thorn(字符 254)时某些 COLLATION 中的错误结果

CHARINDEX returns the wrong result in some COLLATIONs when searching for thorn (character 254)

概览

CHARINDEX 在使用像这样的归类序列时偶尔会返回错误的值:

Latin1_General_CI_AS 

但可以使用如下排序顺序:

SQL_Latin1_General_CP1_CI_AS

这在 MS SQL Server 2008 R2 和 SQL Server 2016 上遇到过。

例子

假设数据库整理顺序为:

Latin1_General_CI_AS

Latin1... 归类序列是否存在已知错误?

这将 return 正确的结果:

select CHARINDEX( NCHAR(254) Collate Latin1_General_BIN2, N'Tþ' Collate Latin1_General_BIN2)
select CHARINDEX( NCHAR(254) Collate Latin1_General_BIN2, N'Th' Collate Latin1_General_BIN2 )
select CHARINDEX( NCHAR(253) Collate Latin1_General_BIN2, N'Th' Collate Latin1_General_BIN2 )
select CHARINDEX( NCHAR(254) Collate Latin1_General_BIN2, N'Thþ' Collate Latin1_General_BIN2)

文档说:

Using Binary Collations

The following considerations will help you to decide whether old or new binary collations are appropriate for your Microsoft SQL Server implementation. Support for both BIN and BIN2 collations will continue in future SQL Server releases.

Binary collations sort data based on the sequence of coded values defined in a particular code page. A binary collation in SQL Server defines the language locale and the ANSI code page to be used, enforcing a binary sort order. Binary collations are useful in achieving improved application performance due to their relative simplicity.

Previous binary collations in SQL Server performed an incomplete code-point-to-code-point comparison for Unicode data, in that older SQL Server binary collations compared the first character as WCHAR, followed by a byte-by-byte comparison. For backward compatibility reasons, existing binary collation semantics will not be changed.

Guidelines for Using Binary Collations

If your Microsoft SQL Server 2005 applications interact with older versions of SQL Server that use binary collations, continue to use binary. Binary collations may be a more suitable choice for mixed environments.

Guidelines for Using BIN2 Collations

Binary collations in this release of SQL Server include a new set of pure code-point comparison collations. Customers can choose to migrate to the new binary collations to take advantage of true code-point comparisons, and they should utilize the new binary collations for development of new applications. The new BIN2 suffix identifies collation names that implement the new code-point collation semantics. In addition, a new comparison flag is added corresponding to BIN2 for the new binary sort. Advantages include simpler application development and clearer semantics.

即。 BIN2 归类等同于在 C# 中使用 Ordinal wrt to charindex.

这不是 SQL 服务器特有的内容。

在 C# 中

string.Compare("þ", "th", false, new System.Globalization.CultureInfo(1033))

returns 0表示字符串比较相等。

或者在记事本中点击下面的"Replace all"

导致

在 SQL 中,不以 "SQL" 开头的服务器排序规则使用 windows 排序规则。

对于大多数地区(冰岛除外),thorn character þ 扩展为 th.

在 Michael S. Kaplan Every rose has it's Þ..... That blog has a wealth of information about windows collations. Expansions are described in more detail here 的 post 中有关于此特定案例的更多信息。

如果您不想要这些语义,则需要使用具有您确实想要的语义的排序规则(可能是 SQL 或二进制)(可能通过明确的 collate 子句)。