将字节数组编码解码为字符串而不会丢失数据

Question

我尝试将 byte[] 转换为字符串，如下所示：

Map<String, String> biomap = new HashMap<String, String>();
biomap.put("L1", new String(Lf1, "ISO-8859-1"));

其中 Lf1 是 byte[] 数组，然后我将此字符串转换为 byte[]：问题是，当我将字节数组转换为字符串时，它是这样的：

FMR  F P�d@� �0d@r (@� ......... etc

和

String SF1 = biomap.get("L1");
byte[] storedL1 = SF1.getBytes("ISO-8859-1")

当我将它转换回字节数组并比较两个数组时，它 return 为假。我的意思是数据已更改。

我想要与编码为字符串并解码为 byte[]

时相同的 byte[] 数据

Answer 1

有一些特殊的编码，例如 base64 用于为纯文本系统编码二进制数据。

仅当 byte[] 包含根据所选编码的有效字节序列时，才能保证将 byte[] 转换为 String。未知字节序列可能会替换为 unicode 替换字符（如您的示例所示）。

Answer 2

首先：如果使用此编码将任意字节数组转换为字符串，ISO-8859-1 不会导致任何数据丢失。考虑以下程序：

public class BytesToString {
    public static void main(String[] args) throws Exception {
        // array that will contain all the possible byte values
        byte[] bytes = new byte[256];
        for (int i = 0; i < 256; i++) {
            bytes[i] = (byte) (i + Byte.MIN_VALUE);
        }

        // converting to string and back to bytes
        String str = new String(bytes, "ISO-8859-1");
        byte[] newBytes = str.getBytes("ISO-8859-1");

        if (newBytes.length != 256) {
            throw new IllegalStateException("Wrong length");
        }
        boolean mismatchFound = false;
        for (int i = 0; i < 256; i++) {
            if (newBytes[i] != bytes[i]) {
                System.out.println("Mismatch: " + bytes[i] + "->" + newBytes[i]);
                mismatchFound = true;
            }
        }
        System.out.println("Whether a mismatch was found: " + mismatchFound);
    }
}

它构建一个包含所有可能字节值的字节数组，然后使用 ISO-8859-1 将其转换为 String，然后使用相同的编码返回字节。

此程序输出 Whether a mismatch was found: false，因此通过 ISO-8859-1 进行 bytes->String->bytes 转换会产生与开始时相同的数据。

但是，正如评论中指出的那样，String 不是二进制数据的良好容器。具体来说，这样的字符串几乎肯定会包含不可打印的字符，因此如果您打印它或尝试通过 HTML 或其他方式传递它，您会遇到一些问题（例如数据丢失）。

如果您确实需要将字节数组转换为字符串（并且不透明地使用它），请使用 base64 编码：

String stringRepresentation = Base64.getEncoder().encodeToString(bytes);
byte[] decodedBytes = Base64.getDecoder().decode(stringRepresentation);

需要更多 space，但生成的字符串在打印方面是安全的。

将字节数组编码解码为字符串而不会丢失数据

encoding decoding of byte array to string without data loss

java

arrays

string

encode

decode