将 unicode 字符转换为包含其 u+[hexa] 表示形式 ("\u2030") 的字符串

Question

我正在使用 java 8 和 I18N。根据我的理解，.properties 文件（以及后续的 I18N 代码）假定文件采用 "ISO-8859-1" 文件格式。因此，我遇到了无法以该文件格式表示的字符的问题。

从文件写入器更改为 OutputStreamWriter 无济于事，因为代码的另一端无论如何都无法读取这些字符。

我确实想出了一个可行的解决方案，但它非常不优雅。

StringBuilder utfRepresentation = new StringBuilder();
for (int index = 0; index < input.length(); index++) {
    if (!Charset.forName("ISO-8859-1").newEncoder().canEncode(input.charAt(index))) {
        utfRepresentation.append("\u");
        utfRepresentation.append(Integer.toHexString(input.codePointAt(index)));
    } else {
        utfRepresentation.append(input.charAt(index));
    }
}

现在我确实需要做其他事情，比如提取编码器而不是每次都制作一个新编码器，但我的问题完全是另外一回事：

1) 是否有更简洁的方法将 ‰ 转换为 \u2030
2）这个U+2030到底是什么？ UTF-8/16?
3) 有没有更好的方法来创建字符集/编码器？不是静态的东西？我可以从文件中提取它吗？或文件 reader / writer?

Answer 1

作为历史异常，.properties 在 ISO-8859-1 中，您可以使用 StandardCharsets.ISO_8859_1（如果不在 Android 中）。

但是您可以将 u-escaping: \u2030 用于其他字符，其中应该理解这是存储在单个 char（两个字节）中的 UTF-16 表示。一些 Unicode 符号超过了两个字节的限制，并以 "surrogate" 对编码。

当从 PropertyResourceBundle 读取时，每个 \uXXXX 将被自动解码
您可以构建将 UTF-8 模板文件转换为 u-escaped .properties；例如在 Maven 中。
有时 ListResourceBundle 更合适。它在 java 中有一个数组，所有 java 源都可以设置为国际项目的 UTF-8。它的行为不同：立即加载所有字符串。

但是显然您还想将写入代码中的.properties；因此不在 class 路径上。

这里最好看属性

为此 Properties class 是理想的。它具有属性的 XML 变体（而不是键值行），默认情况下使用 UTF-8。但也可以用另一种 (UTF-8) 编码读取和写入传统的 .properties。

StringBuilder utfRepresentation = new StringBuilder();
for (int index = 0; index < input.length(); index++) {
    char ch = input.charAt(index);
    if (ch < 128) {
        utfRepresentation.append(ch);
    } else {
        utfRepresentation.append(String.format("\u%04X", ch));
    }
}

将 unicode 字符转换为包含其 u+[hexa] 表示形式 ("\u2030") 的字符串

Transforming unicode characters to a string containing their u+[hexa] representation ("\u2030")

java

internationalization

utf

java-8