Java ：字节数组打印相同字符串的未知值

Question

我有以下 String 存储在文本文件中，也作为 Java 中的变量：'destructive'

下面是我的代码

public class SimpleTest {

    public static void main(String[] args) {
        try {
            File file = new File("TestFIle.txt");
            byte[] file_encoded = FileUtils.readFileToString(file, "UTF-8").getBytes("UTF-8");
            System.out.println(Arrays.toString(file_encoded));

            String toEncrypt = "‘destructive’";
            byte[] encoded = toEncrypt.getBytes(Charset.forName("UTF-8"));
            System.out.println(Arrays.toString(encoded));
        } catch (IOException ex) {
            Logger.getLogger(SimpleTest.class.getName()).log(Level.SEVERE, null, ex);
        }
    }
}

如你所见

String toEncrypt = "‘destructive’";

TestFIle.txt中的内容也是：‘破坏性’

当我运行我得到的代码：

[-17, -69, -65, -30, -128, -104, 100, 101, 115, 116, 114, 117, 99, 116, 105, 118, 101, -30, -128, -103]
[-30, -128, -104, 100, 101, 115, 116, 114, 117, 99, 116, 105, 118, 101, -30, -128, -103]

从文件中读取相同的文本时，字节数组开头的附加 [-17, -69, -65] 是什么，为什么我会得到它？

Answer 1

开头的 [-17, -69, -65] 是 UTF-8 的 byte order mark。在十六进制中，BOM 是 [0xEF, 0xBB, 0xBF]，实际上是 [239, 187, 191]。但是因为 Java 的 byte 是有符号的，所以这些数字被解释（并打印）为负数。

总的来说，BOM是可选的，在微软生态系统中似乎很常见：https://superuser.com/questions/1553666/utf-8-vs-utf-8-with-bom

Answer 2

您的文件似乎包含以 UTF-8 编码并带有前导 byte order mark (BOM) 的文本。 UTF-8 的 BOM 是 EF BB BF。在二进制补码表示中，这是 -17 -69 -65。

Java ：字节数组打印相同字符串的未知值

Java : Byte array prints unknown values for same string

java

arrays

encoding

character-encoding