将字节写入文件时数据丢失

Question

我正在为一项学校作业开发弦乐压缩器，

有一个错误我似乎无法解决。压缩数据正在使用 FileWriter 写入文件，由字节数组表示。压缩算法 returns 输入流，因此数据流如下：

piped input stream
-> input stream reader
-> data stored in char buffer
-> data written to file with file writer.

现在，错误是，对于一些非常具体的字符串，字节数组中的倒数第二个字节写错了。它总是相同的位值“11111100”。

每次都是这个位值，并且总是倒数第二个字节。

以下是代码中的一些示例：

  InputStream compress(InputStream){

  //...
  //...

  PipedInputStream pin = new PipedInputStream();
  PipedOutputStream pout = new PipedOutputStream(pin);
  ObjectOutputStream oos = new ObjectOutputStream(pout);

  oos.writeObject(someobject); 
  oos.flush();

  DataOutputStream dos = new DataOutputStream(pout);


  dos.writeFloat(//);
  dos.writeShort(//);
  dos.write(SomeBytes); // ---Here 
  dos.flush();
  dos.close();

 return pin;
}

void write(char[] cbuf, int off, int len){

  //....
  //....

  InputStreamReader s = new InputStreamReader(
            c.compress(new ByteArrayInputStream(str.getBytes())));

  s.read(charbuffer);

  out.write(charbuffer);
 }

例如触发它的字符串是"hello and good evenin"。

我试过遍历字节数组并一个一个地写入它们，但没有帮助。

还值得注意的是，当我尝试使用算法本身中的输出流写入文件时，它运行良好。顺便说一句，这个设计不是我的选择。

所以我不太确定我做错了什么。

Answer 1

考虑到你说的是：

Now, the bug is, that with some very specific strings, the second to last byte in the byte array is written wrong. and it's always the same bit values "11111100".

您正在参加

binary stream  (the compressed data)
-> reading it as chars 
-> then writing it as chars.

并且您在没有明确定义编码的情况下将字节转换为字符。

我会说问题是您的 InputStreamReader 正在以您不希望的方式翻译一些字节序列。

请记住，在 utf-8 这样的编码中，两个或三个字节可能会变成一个字符。

您指出的字节模式 (11111100) 是 utf-8 转义码之一 (1111110x) 绝非巧合。查看此维基百科 table，您会发现 uft-8 具有破坏性，因为如果一个字节以 1111110x 开头，则下一个字节必须以 10xxxxxx 开头。

表示如果使用utf-8转换

 bytes1[] -> chars[] -> bytes2[]

在某些情况下，bytes2 与 bytes1 不同。

我建议更改您的代码以删除这些阅读器。或者指定 ASCII 编码以查看是否会阻止翻译。

Answer 2

我通过使用 Base64 编码和解码字节解决了这个问题。

将字节写入文件时数据丢失

Data loss when writing bytes to a file

java

compression

string

file

stream