不编码转换String to/from字节数组

Question

我有一个通过网络连接读取的字节数组，我需要将其转换为不带任何编码的字符串，也就是说，只需将每个字节视为字符的低端并将高端留为零。我还需要做相反的事情，我知道角色的高端总是零。

网上搜索了几个类似的问题，都得到了回复，说明必须更改原始数据源。这不是一个选项，所以请不要建议它。

这在 C 中是微不足道的，但 Java 似乎需要我编写自己的转换例程，这可能非常低效。有没有我错过的简单方法？

Answer 1

这是一个示例代码，它将 String 转换为 byte array 并返回 String，无需编码。

public class Test
{

    public static void main(String[] args)
    {
        Test t = new Test();
        t.Test();
    }

    public void Test()
    {
        String input = "Hèllo world";
        byte[] inputBytes = GetBytes(input);
        String output = GetString(inputBytes);
        System.out.println(output);
    }

    public byte[] GetBytes(String str)
    {
        char[] chars = str.toCharArray();
        byte[] bytes = new byte[chars.length * 2];
        for (int i = 0; i < chars.length; i++)
        {
            bytes[i * 2] = (byte) (chars[i] >> 8);
            bytes[i * 2 + 1] = (byte) chars[i];
        }

        return bytes;
    }

    public String GetString(byte[] bytes)
    {
        char[] chars = new char[bytes.length / 2];
        char[] chars2 = new char[bytes.length / 2];
        for (int i = 0; i < chars2.length; i++)
            chars2[i] = (char) ((bytes[i * 2] << 8) + (bytes[i * 2 + 1] & 0xFF));

        return new String(chars2);

    }
}

Answer 2

这会将字节数组转换为字符串，同时只填充高 8 位。

public static String stringFromBytes(byte byteData[]) {
    char charData[] = new char[byteData.length];
    for(int i = 0; i < charData.length; i++) {
        charData[i] = (char) (((int) byteData[i]) & 0xFF);
    }
    return new String(charData);
}

效率应该还不错。正如 Ben Thurley 所说，如果性能确实是一个问题，请不要首先转换为字符串，而是使用字节数组。

Answer 3

字符串已编码为 Unicode/UTF-16。 UTF-16 意味着最多可以使用 2 个字符串 "characters"(char) 来制作一个可显示的字符。您真正想要使用的是：

byte[] bytes = System.Text.Encoding.Unicode.GetBytes(myString);

将字符串转换为字节数组。这与您在上面所做的完全相同，只是性能快了 10 倍。如果您想将传输数据减少近一半，我建议将其转换为 UTF8（ASCII 是 UTF8 的子集）——互联网 90% 的时间都使用这种格式，方法是调用：

byte[] bytes = Encoding.UTF8.GetBytes(myString);

要转换回字符串，请使用：

String myString = Encoding.Unicode.GetString(bytes);

或

String myString = Encoding.UTF8.GetString(bytes);

Answer 4

使用已弃用的构造函数String(byte[] ascii, int hibyte)

String string = new String(byteArray, 0);

Answer 5

不，你没有遗漏任何东西。没有简单的方法可以做到这一点，因为 String 和 char 用于文本。您显然不想将数据作为文本处理——如果它不是文本，这将完全有意义。你可以按照你建议的困难方式来做。

另一种方法是假设字符编码允许任意字节值 (0-255) 的任意序列。 ISO-8859-1 或 IBM437 均符合条件。（Windows-1252 只有 251 个代码点。UTF-8 不允许任意序列。）如果您使用 ISO-8859-1，生成的字符串将与您的硬方式相同。

至于效率，处理字节数组的最有效方法是将其保留为字节数组。

不编码转换String to/from字节数组

Convert String to/from byte array without encoding

java

string

data-conversion