设置数据库 uft8mb4 后仍然存在黑色菱形和问号

Question

关于 MySQL 数据库和 Java JDBC 连接编码。数据库已转换为 utf8mb4 和 utf8mb4_unicode_ci，如图所示，这是来自 JDBC 连接的 SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'; 的结果。

+--------------------------+--------------------+
|      Variable_name       |       Value        |
+--------------------------+--------------------+
| character_set_client     | utf8mb4            |
| character_set_connection | utf8mb4            |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8mb4            |
| character_set_server     | utf8mb4            |
| character_set_system     | utf8               |
| collation_connection     | utf8mb4_unicode_ci |
| collation_database       | utf8mb4_unicode_ci |
| collation_server         | utf8mb4_unicode_ci |
+--------------------------+--------------------+

从 MySQL Workbench 和直接连接到数据库的终端我可以看到 Unicode 字符 í 及其正确的十六进制值 c3 ad

+------------------------------+
| HEX(location.name)           |
+------------------------------+
| C3AD                         |
+------------------------------+

JDBC 连接设置： useUnicode=true&characterEncoding=UTF-8

使用配置为

的 HikariCP

config.addDataSourceProperty("useUnicode", "true"); config.addDataSourceProperty("characterEncoding", "utf-8"); config.setConnectionInitSql("SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci");

使用mysql-connector-java:8.0.11

根据使用 JDBC 连接查询相关 table 的结果，í 字符在 Postman 中返回为 �。 </code> 在 Postman 中返回为 <code>?。

根据，这让我相信我在阅读过程中的连接不是 UTF-8，我该如何检测？

如果需要，数据库和应用程序已重置以应用设置。

Answer 1

characterEncoding=utf-8 与 utf8mb4 不兼容。在 JDBC URL 中使用 character_set_server=utf8mb4，或者 config.addDataSourceProperty("character_set_server", "utf8mb4");。完全不要使用 characterEncoding。

来自 MySQL Connection/J 开发人员指南 → 使用字符集 → Setting the Character Encoding:

… to use the 4-byte UTF-8 character set with Connector/J, configure the MySQL server with character_set_server=utf8mb4, and leave characterEncoding out of the Connector/J connection string.

在其正下方：

Warning

In order to use the utf8mb4 character set for the connection, the server MUST be configured with character_set_server=utf8mb4; if that is not the case, when UTF-8 is used for characterEncoding in the connection string, it will map to the MySQL character set name utf8, which is an alias for utf8mb3.

Answer 2

除了遵循 VGR 的帮助外，我还使用普通的 PrintWriter 发送不允许 UTF-8 编码的响应。而不是

PrintWriter out = response.getWriter();
out.println(res);
out.flush();

替换为

response.getOutputStream().write(res.toString().getBytes("UTF-8"));

Answer 3

"í 字符返回为 �" 与"</code> 返回为 <code>?" 是不同的问题"=15= ]

前者通常发生在 í 的字节未编码为 UTF-8 时。请注意，在 MySQL 中，utf8mb3 和 utf8mb4 对于该字符和所有其他欧洲字符的正确编码是相同的。修复连接（如 VGR 所讨论）可能不会修复它。黑色菱形似乎仅在浏览器未设置为 UTF-8 (Unicode) 时出现。

"pile of poo" 仅适用于 utf8mb4，不适用于 utf8mb3。因此，假设客户端正确地具有十六进制 F09F92A9，那么连接参数（请参阅 VGR）可能会导致问题。

（更多讨论在您提供的 link 中。）

设置数据库 uft8mb4 后仍然存在黑色菱形和问号

Black diamonds and question marks persisting after setting database uft8mb4

java

mysql

jdbc

utf8mb4

payara