为什么我的 MySQL 实例认为“'ī”和 "i" 是相等的字符？

Question

我正在运行解决 MySQL 中的唯一性约束问题。我有两个不同的角色：

ī 即 ASCII 十进制数 299
i 即 ASCII 十进制 109

为什么 MySQL 认为这些是平等的？当我运行以下内容时：

SELECT STRCMP('ī', 'i')

我得到 0 的 return 值。

根据请求，这里有一些关于我的环境的信息：

mysql> SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8mb4                  | utf8mb4_unicode_ci   |
+--------------------------+----------------------+

Answer 1

您的数据库默认整理 不区分大小写：即 @@collation_database 末尾的 ci 部分。在这种整理中，对于大多数语言，变音符号是折叠的。来自文档：

To further illustrate, the following equalities hold in both utf8_general_ci and utf8_unicode_ci (for the effect of this in comparisons or searches, see Section 10.8.6, “Examples of the Effect of Collation”):

Ä = A
Ö = O
Ü = U

https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html

除非您找到这样做的理由，否则我建议更改此默认值。

Answer 2

如果字符在当前排序规则中定义为相等，则字符相等。

字符串定义了字符集和排序规则。如果字符串来自 table，则 table 或列定义排序规则。

如果字符串是您在 SQL 表达式中使用的文字（如您的示例），则字符串的排序规则默认为 MySQL 选项 character_set_connection 的会话值和 collation_connection.

您可以使用 COLLATE 子句覆盖给定字符串文字的会话值：

mysql> SELECT STRCMP('ī', 'i' COLLATE utf8mb4_bin);
+---------------------------------------+
| STRCMP('ī', 'i' COLLATE utf8mb4_bin)  |
+---------------------------------------+
|                                     1 |
+---------------------------------------+

见https://dev.mysql.com/doc/refman/8.0/en/charset-introducer.html

为什么我的 MySQL 实例认为“'ī”和 "i" 是相等的字符？

Why does my MySQL instance think "'ī" and "i" are equal characters?

mysql

collation