为什么 --default-character-set=utf8mb4 对客户端连接没有影响?
Why does --default-character-set=utf8mb4 have no impact on the client connection?
根据 https://dev.mysql.com/doc/refman/5.6/en/charset-connection.html,当我使用以下命令使用 mysql 8.0 客户端连接到 mysql 5.6 服务器时:
/usr/bin/mysql -h ${DB_HOST} -u ${DB_USER} -p --default-character-set=utf8mb4
我预计客户端会建立到服务器的utf8mb4 连接。但是,连接设置为 latin1:
mysql> SELECT * FROM INFORMATION_SCHEMA.SESSION_VARIABLES WHERE VARIABLE_NAME IN (
'character_set_client', 'character_set_connection', 'character_set_results', 'collation_connection' )
ORDER BY VARIABLE_NAME;
+--------------------------+-------------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+--------------------------+-------------------+
| CHARACTER_SET_CLIENT | latin1 |
| CHARACTER_SET_CONNECTION | latin1 |
| CHARACTER_SET_RESULTS | latin1 |
| COLLATION_CONNECTION | latin1_swedish_ci |
+--------------------------+-------------------+
使用其他字符集,例如:
/usr/bin/mysql -h ${DB_HOST} -u ${DB_USER} -p --default-character-set=koi8r
将导致客户端连接提供的字符集:
mysql> SELECT * FROM INFORMATION_SCHEMA.SESSION_VARIABLES WHERE VARIABLE_NAME IN ( 'character_set_client', 'character_set_connection', 'character_set_results', 'collation_connection' ) ORDER BY VARIABLE_NAME;
+--------------------------+------------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+--------------------------+------------------+
| CHARACTER_SET_CLIENT | koi8r |
| CHARACTER_SET_CONNECTION | koi8r |
| CHARACTER_SET_RESULTS | koi8r |
| COLLATION_CONNECTION | koi8r_general_ci |
+--------------------------+------------------+
我可以更改客户端连接的唯一方法是在连接到服务器后执行 charset utf8mb4
或 SET NAMES utf8mb4
。
mysql> SET NAMES utf8mb4;
Query OK, 0 rows affected (0.01 sec)
mysql> SELECT * FROM INFORMATION_SCHEMA.SESSION_VARIABLES WHERE VARIABLE_NAME IN ( 'character_set_client', 'character_set_connection', 'character_set_results', 'collation_connection' ) ORDER BY VARIABLE_NAME;
+--------------------------+--------------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+--------------------------+--------------------+
| CHARACTER_SET_CLIENT | utf8mb4 |
| CHARACTER_SET_CONNECTION | utf8mb4 |
| CHARACTER_SET_RESULTS | utf8mb4 |
| COLLATION_CONNECTION | utf8mb4_general_ci |
+--------------------------+--------------------+
为什么 --default-character-set=utf8mb4 不起作用?我想使用其他客户端工具,如 mysqldump 和 mysqlimport,但如果没有此标志,我将获得 latin1 编码而不是 utf8mb4 编码。在这种情况下更改默认服务器设置不是一个选项,必须从客户端完成。
更多信息:我正在从 20.04 ubuntu WSL2 安装中尝试此操作,因此没有可用的 5.6 或 5.7 客户端。但是,使用 5.6 或 5.7 windows mysql 客户端将遵守 --default-character-set=utf8mb4,但 8.0 windows 客户端具有与 WSL2 客户端相同的行为。
此行为在 8.0 文档中进行了解释:https://dev.mysql.com/doc/refman/8.0/en/charset-connection.html
The same problem occurs in a more subtle context: When the client tells the server to use a character set that the server recognizes,
but the default collation for that character set on the client side is
not known on the server side. This occurs, for example, when a MySQL
8.0 client wants to connect to a MySQL 5.7 server using utf8mb4 as the client character set. A client that specifies
--default-character-set=utf8mb4 is able to connect to the server. However, as in the previous example, the server falls back to its
default character set and collation, not what the client requested:
mysql> SHOW SESSION VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
...
|character_set_results | latin1 |
...
+--------------------------+--------+
mysql> SHOW SESSION VARIABLES LIKE 'collation_connection';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
+----------------------+-------------------+
Why does this occur? After all, utf8mb4 is known to the 8.0 client and the 5.7 server, so
both of them recognize it. To understand this behavior, it is
necessary to understand that when the client tells the server which
character set it wants to use, it really tells the server the default
collation for that character set. Therefore, the aforementioned
behavior occurs due to a combination of factors:
The default collation for utf8mb4 differs between MySQL 5.7 and 8.0
(utf8mb4_general_ci for 5.7, utf8mb4_0900_ai_ci for 8.0).
When the 8.0 client requests a character set of utf8mb4, what it sends
to the server is the default 8.0 utf8mb4 collation; that is, the
utf8mb4_0900_ai_ci.
utf8mb4_0900_ai_ci is implemented only as of MySQL 8.0, so the 5.7
server does not recognize it.
Because the 5.7 server does not recognize utf8mb4_0900_ai_ci, it
cannot satisfy the client character set request, and falls back to its
default character set and collation (latin1 and latin1_swedish_ci).
In this case, the client can still use utf8mb4 by issuing a SET NAMES
'utf8mb4' statement after connecting. The resulting collation is the
5.7 default utf8mb4 collation; that is, utf8mb4_general_ci. If the client additionally wants a collation of utf8mb4_0900_ai_ci, it cannot
achieve that because the server does not recognize that collation. The
client must either be willing to use a different utf8mb4 collation, or
connect to a server from MySQL 8.0 or higher.
根据 https://dev.mysql.com/doc/refman/5.6/en/charset-connection.html,当我使用以下命令使用 mysql 8.0 客户端连接到 mysql 5.6 服务器时:
/usr/bin/mysql -h ${DB_HOST} -u ${DB_USER} -p --default-character-set=utf8mb4
我预计客户端会建立到服务器的utf8mb4 连接。但是,连接设置为 latin1:
mysql> SELECT * FROM INFORMATION_SCHEMA.SESSION_VARIABLES WHERE VARIABLE_NAME IN (
'character_set_client', 'character_set_connection', 'character_set_results', 'collation_connection' )
ORDER BY VARIABLE_NAME;
+--------------------------+-------------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+--------------------------+-------------------+
| CHARACTER_SET_CLIENT | latin1 |
| CHARACTER_SET_CONNECTION | latin1 |
| CHARACTER_SET_RESULTS | latin1 |
| COLLATION_CONNECTION | latin1_swedish_ci |
+--------------------------+-------------------+
使用其他字符集,例如:
/usr/bin/mysql -h ${DB_HOST} -u ${DB_USER} -p --default-character-set=koi8r
将导致客户端连接提供的字符集:
mysql> SELECT * FROM INFORMATION_SCHEMA.SESSION_VARIABLES WHERE VARIABLE_NAME IN ( 'character_set_client', 'character_set_connection', 'character_set_results', 'collation_connection' ) ORDER BY VARIABLE_NAME;
+--------------------------+------------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+--------------------------+------------------+
| CHARACTER_SET_CLIENT | koi8r |
| CHARACTER_SET_CONNECTION | koi8r |
| CHARACTER_SET_RESULTS | koi8r |
| COLLATION_CONNECTION | koi8r_general_ci |
+--------------------------+------------------+
我可以更改客户端连接的唯一方法是在连接到服务器后执行 charset utf8mb4
或 SET NAMES utf8mb4
。
mysql> SET NAMES utf8mb4;
Query OK, 0 rows affected (0.01 sec)
mysql> SELECT * FROM INFORMATION_SCHEMA.SESSION_VARIABLES WHERE VARIABLE_NAME IN ( 'character_set_client', 'character_set_connection', 'character_set_results', 'collation_connection' ) ORDER BY VARIABLE_NAME;
+--------------------------+--------------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+--------------------------+--------------------+
| CHARACTER_SET_CLIENT | utf8mb4 |
| CHARACTER_SET_CONNECTION | utf8mb4 |
| CHARACTER_SET_RESULTS | utf8mb4 |
| COLLATION_CONNECTION | utf8mb4_general_ci |
+--------------------------+--------------------+
为什么 --default-character-set=utf8mb4 不起作用?我想使用其他客户端工具,如 mysqldump 和 mysqlimport,但如果没有此标志,我将获得 latin1 编码而不是 utf8mb4 编码。在这种情况下更改默认服务器设置不是一个选项,必须从客户端完成。
更多信息:我正在从 20.04 ubuntu WSL2 安装中尝试此操作,因此没有可用的 5.6 或 5.7 客户端。但是,使用 5.6 或 5.7 windows mysql 客户端将遵守 --default-character-set=utf8mb4,但 8.0 windows 客户端具有与 WSL2 客户端相同的行为。
此行为在 8.0 文档中进行了解释:https://dev.mysql.com/doc/refman/8.0/en/charset-connection.html
The same problem occurs in a more subtle context: When the client tells the server to use a character set that the server recognizes, but the default collation for that character set on the client side is not known on the server side. This occurs, for example, when a MySQL 8.0 client wants to connect to a MySQL 5.7 server using utf8mb4 as the client character set. A client that specifies --default-character-set=utf8mb4 is able to connect to the server. However, as in the previous example, the server falls back to its default character set and collation, not what the client requested:
mysql> SHOW SESSION VARIABLES LIKE 'character\_set\_%'; +--------------------------+--------+ | Variable_name | Value | +--------------------------+--------+ | character_set_client | latin1 | | character_set_connection | latin1 | ... |character_set_results | latin1 | ... +--------------------------+--------+ mysql> SHOW SESSION VARIABLES LIKE 'collation_connection'; +----------------------+-------------------+ | Variable_name | Value | +----------------------+-------------------+ | collation_connection | latin1_swedish_ci | +----------------------+-------------------+
Why does this occur? After all, utf8mb4 is known to the 8.0 client and the 5.7 server, so both of them recognize it. To understand this behavior, it is necessary to understand that when the client tells the server which character set it wants to use, it really tells the server the default collation for that character set. Therefore, the aforementioned behavior occurs due to a combination of factors:
The default collation for utf8mb4 differs between MySQL 5.7 and 8.0 (utf8mb4_general_ci for 5.7, utf8mb4_0900_ai_ci for 8.0).
When the 8.0 client requests a character set of utf8mb4, what it sends to the server is the default 8.0 utf8mb4 collation; that is, the utf8mb4_0900_ai_ci.
utf8mb4_0900_ai_ci is implemented only as of MySQL 8.0, so the 5.7 server does not recognize it.
Because the 5.7 server does not recognize utf8mb4_0900_ai_ci, it cannot satisfy the client character set request, and falls back to its default character set and collation (latin1 and latin1_swedish_ci).
In this case, the client can still use utf8mb4 by issuing a SET NAMES 'utf8mb4' statement after connecting. The resulting collation is the 5.7 default utf8mb4 collation; that is, utf8mb4_general_ci. If the client additionally wants a collation of utf8mb4_0900_ai_ci, it cannot achieve that because the server does not recognize that collation. The client must either be willing to use a different utf8mb4 collation, or connect to a server from MySQL 8.0 or higher.