GZIPOutputStream/GZIPInputStream 中的奇怪行为

Strange behavior in GZIPOutputStream/GZIPInputStream

我已将这段代码中的奇怪问题减少到最少。该程序将 (int)90000 字节的 128,000 倍写入文件,然后尝试将其读回。

设置 zipped=false 一切都很好用 设置 zipped=true 并且一切正常,直到第 496 个 1024 字节块。那时一个字节丢失,所有内容都向左移动一个字节(见输出)

...
0 1 95 -112- 是 int 90,000
的字节码 专柜:496 126937
1 95 -112 0- 是 int 23,040,000
的字节码 ...

这是我想出的代码。我只是不明白为什么它会在一遍又一遍地做同样的事情的过程中突然中断。任何 help/insights/explainers 非常感谢。

public class TestApp7 {

static final boolean    zipped = true;
static File             theFile = null;

private static void writeZipData() throws Exception {
    FileOutputStream fos = new FileOutputStream(theFile);
    BufferedOutputStream bos = null;
    if (zipped) {
        GZIPOutputStream gzout = new GZIPOutputStream(fos);
        bos = new BufferedOutputStream(gzout);
    } else 
        bos = new BufferedOutputStream(fos);
    byte[] bs9 = RHUtilities.toByteArray((int)90000);
    for (int i=0; i<128000; i++)
        bos.write(bs9);
    bos.flush();
    bos.close();
}

private static void readZipData() throws Exception {
    byte[] buf = new byte[1024];
    int chunkCounter = 0;
    int intCounter = 0;
    FileInputStream fin = new FileInputStream(theFile);
    int rdLen = 0;
    if (zipped) {
        GZIPInputStream gin = new GZIPInputStream(fin);
        while ((rdLen = gin.read(buf)) != -1) {
            System.out.println("Counters: " + chunkCounter + " " + intCounter);
            for (int i=0; i<rdLen/4; i++) {
                byte[] bs = Arrays.copyOfRange(buf,(i*4),((i+1)*4));
                intCounter++;
                System.out.print(bs[0] + " " + bs[1] + " " + bs[2] + " " + bs[3]);
            }
            chunkCounter++;
        }
        gin.close();
    } else {
        while ((rdLen = fin.read(buf)) != -1) {
            System.out.println("Counters: " + chunkCounter + " " + intCounter);
            for (int i=0; i<rdLen/4; i++) {
                byte[] bs = Arrays.copyOfRange(buf,(i*4),((i+1)*4));
                intCounter++;
                System.out.print(bs[0] + " " + bs[1] + " " + bs[2] + " " + bs[3]);
            }
            chunkCounter++;
        }
    }
    fin.close();
}

public static void main(String args[]) {
    try {
        if (zipped)
            theFile = new File("Test.gz");
        else
            theFile = new File("Test.dat");
        writeZipData();
        readZipData();
    } catch (Throwable e) { e.printStackTrace(); }
}
}

所以基于 Jon 的精彩评论......即使流中有更多字节,你也不能依赖 .read(buffer) 填充缓冲区 - 它在 BufferedOutputStream 包装的 GZIPOutputStream 保存块的边界处停止数据的。只需添加另一个读取以超出边界并完成块

        while ((rdLen = gin.read(buf)) != -1) {
            if (rdLen<chunksize) {
                byte[] missBytes = new byte[chunksize-rdLen];
                int rdLine_miss = 0;
                if ((rdLine_miss = gin.read(missBytes)) > 0)
                    System.arraycopy(missBytes,0,buf,rdLen,rdLine_miss);
                rdLen += rdLine_miss;
            }
            for (int i=0; i<rdLen/4; i++) {
                byte[] bs = Arrays.copyOfRange(buf,(i*4),((i+1)*4));
                intCounter++;
                System.out.println(bs[0] + " " + bs[1] + " " + bs[2] + " " + bs[3] + " ");
            }
            chunkCounter++;
        }