为什么这个字符序列比较 returns 为真？

Question

为什么MySQL中的SQL句returnstrue？

SELECT 'SeP' = 'sęp';

mysql> select 'SeP' = 'sęp';
+----------------+
| 'SeP' = 'sęp'  |
+----------------+
|              1 |
+----------------+
1 row in set (0.00 sec)

我的数据库的字符集和排序规则如下。

mysql> select @@character_set_database, @@collation_database, @@collation_connection;
+--------------------------+----------------------+------------------------+
| @@character_set_database | @@collation_database | @@collation_connection |
+--------------------------+----------------------+------------------------+
| utf8mb4                  | utf8mb4_general_ci   | utf8_general_ci        |
+--------------------------+----------------------+------------------------+
1 row in set (0.00 sec)

Answer 1

你有一个不区分大小写（ci 在归类名称末尾表示这一点）和通用归类，因此 MySQL 以不区分大小写且通常不区分重音的方式比较 2 个字符串, 因此这两个字符串是相同的。

MySQL 关于 Case sensitivity in string searches 的手册说：

For nonbinary strings (CHAR, VARCHAR, TEXT), string searches use the collation of the comparison operands. For binary strings (BINARY, VARBINARY, BLOB), comparisons use the numeric values of the bytes in the operands; this means that for alphabetic characters, comparisons will be case sensitive.

A comparison between a nonbinary string and binary string is treated as a comparison of binary strings.

Simple comparison operations (>=, >, =, <, <=, sorting, and grouping) are based on each character's “sort value.” Characters with the same sort value are treated as the same character. For example, if e and é have the same sort value in a given collation, they compare as equal.

要强制区分大小写，请使用区分大小写 (_cs) 或二进制排序规则 (_bin)。要强制区分重音，您需要使用特定于语言的排序规则（在您的情况下为 utf8mb4_polish_xxx）或二进制排序规则。通用排序规则很少区分重音字符和非重音字符。

Answer 2

COLLATION utf8mb4_polish_ci 会将这些视为 "separate letters"：ą ć ę ń ś ź ż

例如，a < ą < b。在 大多数 其他排序规则中，a = ą < b.

mysql> SET NAMES utf8mb4 COLLATE utf8mb4_polish_ci;

mysql> SELECT 'SeP' = 'sęp';
+----------------+
| 'SeP' = 'sęp'  |
+----------------+
|              0 |
+----------------+

有关各种归类的不同之处的详细信息，请参阅 this。

为什么这个字符序列比较 returns 为真？

How come this character sequence comparison returns true?

mysql

collation

mysql-5.7