使用 java 解析阿拉伯字符

Question

我正在尝试使用 Java 将数据库迁移并解析为新模式。问题是有一些字符，特别是阿拉伯字符，在处理 Java.

中的数据时被弄乱了

这是 countryToParse.sql 文件中我遇到问题的行之一：

(4, 'Afganistán', 1, 'Afgano', 'Afghanistan', 'AF', 'أفغانستان', 'Afghan', 'أفغاني');

在我解析之后，countryParsed.sql 中的结果行被视为：

(4, 'Afganistán', 1, 'Afgano', 'Afghanistan', 'AF', 'أ�?غانستان', 'Afghan', 'أ�?غاني');

你看有些阿拉伯字符是怎么弄乱的。
如果我打开这些文件，我可以检查它们都是用 UTF-8 编码的。

这是我正在使用的 Java 代码。在方法 writeToTextFile() 中，我添加了三种我发现的使用 UTF-8 编写文件的方法（更不用说我在使用这三种方法时遇到了同样的错误）

public class MainWhosebug {

public static void main(String[] args) throws IOException {

    String countryStr = new         String(readTextFile("src/data/countryToParse.sql").getBytes(), "UTF-8");
    writeToTextFile("src/data/countryParsed.sql", countryStr);
}

    public static String readTextFile(String fileName) throws IOException {
    String content = new String(Files.readAllBytes(Paths.get(fileName)));
        return content;
    }

    public static void writeToTextFile(String fileName, String content) throws IOException {

         /* Way 1 */
         Files.write(Paths.get(fileName), content.getBytes("UTF-8"), StandardOpenOption.CREATE);


        /* Way 2 */
         BufferedWriter out = new BufferedWriter(new OutputStreamWriter(
                new FileOutputStream(fileName), "UTF-8"));
            try {
                out.write(content);
            } finally {
                out.close();
            }

        /* Way 3 */
        PrintWriter out1 = new PrintWriter(new File(fileName), "UTF-8");
        out1.write(content);
        out1.flush();
        out1.close();
    /* */
    }
}

Answer 1

您忘记在此行设置编码：

String content = new String(Files.readAllBytes(Paths.get(fileName)));

试试这个：

public static void main(String[] args) throws IOException {

    String countryStr = new String(readTextFile("src/data/countryToParse.sql"), "UTF-8");
    writeToTextFile("src/data/countryParsed.sql", countryStr);
}

public static byte[] readTextFile(String fileName) throws IOException {
    return Files.readAllBytes(Paths.get(fileName));
}

使用 java 解析阿拉伯字符

Issue parsing arabic characters using java

java

mysql

utf-8

arabic