mysql 比较重音字符和非重音字符是否相同？

Question

这对我来说没有意义。谁能解释一下？我认为列值应该不同，所以

select * from a1 where f1 = f2;

应该找不到任何行。但是...

mysql> create table a1 (f1 varchar(63), f2 varchar(63));
Query OK, 0 rows affected (0.00 sec)

mysql> show create table a1 \G
*************************** 1. row ***************************
       Table: a1
Create Table: CREATE TABLE `a1` (
  `f1` varchar(63) DEFAULT NULL,
  `f2` varchar(63) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

mysql> 
mysql> insert into a1 values ('EFBBBFD187D0B5D0BBD0BED0B2D0B5D0BA', 'EFBBBFD187D0B5D0BBD0BED0B2D0B5CC81D0BA');
Query OK, 1 row affected (0.02 sec)

mysql> update a1 set f1 = unhex(f1);
Query OK, 1 row affected (0.02 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> update a1 set f2 = unhex(f2);
Query OK, 1 row affected (0.02 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from a1;
+-------------------+---------------------+
| f1                | f2                  |
+-------------------+---------------------+
| человек          | челове́к           |
+-------------------+---------------------+
1 row in set (0.00 sec)

mysql> 
mysql> 
mysql> select * from a1 where f1 = f2;
+-------------------+---------------------+
| f1                | f2                  |
+-------------------+---------------------+
| человек          | челове́к           |
+-------------------+---------------------+
1 row in set (0.00 sec)

mysql> select * from a1 where hex(f1) = hex(f2);
Empty set (0.00 sec)

mysql>

Answer 1

字符等价由相关列使用的排序规则定义。排序规则将每对字符定义为等于、小于或大于，这用于比较和排序。

您的 table 使用 utf8mb4_0900_ai_ci 作为默认排序规则，这适用于所有列，因为它们没有定义排序规则来覆盖 table 的默认排序规则。

排序规则将重音字符视为等同于它们的非重音版本是很常见的。

如果您想选择不同的排序规则，您可以。

Answer 2

开头的3个字节，EFBBBF，是“BOM”，表示文本是UTF-8编码的。

除了“CC81 -- NSM COMBINING ACUTE ACCENT”外，其余看起来像西里尔字母 челове́к

有些排序规则（包括 utf8mb4_0900_ai_ci）处理“组合重音”，有些则不处理。 “ai”的意思是“不区分重音”。

我会理解这个等同于“拉丁语”e。我不知道西里尔小写字母 IE 的规则，它看起来相同 е，但编码不同。

您可能需要 COLLATE utf8mb4_0900_as_ci，它“区分重音且不区分大小写”。

mysql 比较重音字符和非重音字符是否相同？

mysql compares accented vs unaccented characters as the same?

mysql

utf-8