ZLib 解压缩在大字节数组上失败
ZLib decompression fails on large byte array
在尝试使用 ZLib 压缩时,我 运行 遇到了一个奇怪的问题。如果源数组的长度至少为 32752 字节,则使用随机数据解压缩 zlib 压缩的字节数组会重复失败。这里有一个重现问题的小程序,你可以see it in action on IDEOne。压缩和解压的方法都是标准代码摘自教程
public class ZlibMain {
private static byte[] compress(final byte[] data) {
final Deflater deflater = new Deflater();
deflater.setInput(data);
deflater.finish();
final byte[] bytesCompressed = new byte[Short.MAX_VALUE];
final int numberOfBytesAfterCompression = deflater.deflate(bytesCompressed);
final byte[] returnValues = new byte[numberOfBytesAfterCompression];
System.arraycopy(bytesCompressed, 0, returnValues, 0, numberOfBytesAfterCompression);
return returnValues;
}
private static byte[] decompress(final byte[] data) {
final Inflater inflater = new Inflater();
inflater.setInput(data);
try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length)) {
final byte[] buffer = new byte[Math.max(1024, data.length / 10)];
while (!inflater.finished()) {
final int count = inflater.inflate(buffer);
outputStream.write(buffer, 0, count);
}
outputStream.close();
final byte[] output = outputStream.toByteArray();
return output;
} catch (DataFormatException | IOException e) {
throw new RuntimeException(e);
}
}
public static void main(final String[] args) {
roundTrip(100);
roundTrip(1000);
roundTrip(10000);
roundTrip(20000);
roundTrip(30000);
roundTrip(32000);
for (int i = 32700; i < 33000; i++) {
if(!roundTrip(i))break;
}
}
private static boolean roundTrip(final int i) {
System.out.printf("Starting round trip with size %d: ", i);
final byte[] data = new byte[i];
for (int j = 0; j < data.length; j++) {
data[j]= (byte) j;
}
shuffleArray(data);
final byte[] compressed = compress(data);
try {
final byte[] decompressed = CompletableFuture.supplyAsync(() -> decompress(compressed))
.get(2, TimeUnit.SECONDS);
System.out.printf("Success (%s)%n", Arrays.equals(data, decompressed) ? "matching" : "non-matching");
return true;
} catch (InterruptedException | ExecutionException | TimeoutException e) {
System.out.println("Failure!");
return false;
}
}
// Implementing Fisher–Yates shuffle
// source:
static void shuffleArray(byte[] ar) {
Random rnd = ThreadLocalRandom.current();
for (int i = ar.length - 1; i > 0; i--) {
int index = rnd.nextInt(i + 1);
// Simple swap
byte a = ar[index];
ar[index] = ar[i];
ar[i] = a;
}
}
}
这是 ZLib 中的已知错误吗?或者我的压缩/解压缩例程有错误吗?
显然 compress() 方法有问题。
这个有效:
public static byte[] compress(final byte[] data) {
try (final ByteArrayOutputStream outputStream =
new ByteArrayOutputStream(data.length);) {
final Deflater deflater = new Deflater();
deflater.setInput(data);
deflater.finish();
final byte[] buffer = new byte[1024];
while (!deflater.finished()) {
final int count = deflater.deflate(buffer);
outputStream.write(buffer, 0, count);
}
final byte[] output = outputStream.toByteArray();
return output;
} catch (IOException e) {
throw new IllegalStateException(e);
}
}
压缩/解压方法逻辑错误;我在实现方面并不深入,但通过调试我发现了以下内容:
压缩32752字节的缓冲区时,deflater.deflate()
方法returns值为32767,这是你在行中初始化缓冲区的大小:
final byte[] bytesCompressed = new byte[Short.MAX_VALUE];
如果您将缓冲区大小增加到
final byte[] bytesCompressed = new byte[4 * Short.MAX_VALUE];
你会看到,32752 字节的输入实际上被缩减为 32768 字节。所以在你的代码中,压缩数据不包含应该在那里的所有数据。
然后当您尝试解压缩时,inflater.inflate()
方法 returns 零表示需要更多输入数据。但是因为你只检查 inflater.finished()
你会陷入无限循环。
因此您可以增加压缩时的缓冲区大小,但这可能只是意味着更大的文件有问题,或者您最好需要重写 compress/decompress 逻辑以分块处理数据。
在尝试使用 ZLib 压缩时,我 运行 遇到了一个奇怪的问题。如果源数组的长度至少为 32752 字节,则使用随机数据解压缩 zlib 压缩的字节数组会重复失败。这里有一个重现问题的小程序,你可以see it in action on IDEOne。压缩和解压的方法都是标准代码摘自教程
public class ZlibMain {
private static byte[] compress(final byte[] data) {
final Deflater deflater = new Deflater();
deflater.setInput(data);
deflater.finish();
final byte[] bytesCompressed = new byte[Short.MAX_VALUE];
final int numberOfBytesAfterCompression = deflater.deflate(bytesCompressed);
final byte[] returnValues = new byte[numberOfBytesAfterCompression];
System.arraycopy(bytesCompressed, 0, returnValues, 0, numberOfBytesAfterCompression);
return returnValues;
}
private static byte[] decompress(final byte[] data) {
final Inflater inflater = new Inflater();
inflater.setInput(data);
try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length)) {
final byte[] buffer = new byte[Math.max(1024, data.length / 10)];
while (!inflater.finished()) {
final int count = inflater.inflate(buffer);
outputStream.write(buffer, 0, count);
}
outputStream.close();
final byte[] output = outputStream.toByteArray();
return output;
} catch (DataFormatException | IOException e) {
throw new RuntimeException(e);
}
}
public static void main(final String[] args) {
roundTrip(100);
roundTrip(1000);
roundTrip(10000);
roundTrip(20000);
roundTrip(30000);
roundTrip(32000);
for (int i = 32700; i < 33000; i++) {
if(!roundTrip(i))break;
}
}
private static boolean roundTrip(final int i) {
System.out.printf("Starting round trip with size %d: ", i);
final byte[] data = new byte[i];
for (int j = 0; j < data.length; j++) {
data[j]= (byte) j;
}
shuffleArray(data);
final byte[] compressed = compress(data);
try {
final byte[] decompressed = CompletableFuture.supplyAsync(() -> decompress(compressed))
.get(2, TimeUnit.SECONDS);
System.out.printf("Success (%s)%n", Arrays.equals(data, decompressed) ? "matching" : "non-matching");
return true;
} catch (InterruptedException | ExecutionException | TimeoutException e) {
System.out.println("Failure!");
return false;
}
}
// Implementing Fisher–Yates shuffle
// source:
static void shuffleArray(byte[] ar) {
Random rnd = ThreadLocalRandom.current();
for (int i = ar.length - 1; i > 0; i--) {
int index = rnd.nextInt(i + 1);
// Simple swap
byte a = ar[index];
ar[index] = ar[i];
ar[i] = a;
}
}
}
这是 ZLib 中的已知错误吗?或者我的压缩/解压缩例程有错误吗?
显然 compress() 方法有问题。 这个有效:
public static byte[] compress(final byte[] data) {
try (final ByteArrayOutputStream outputStream =
new ByteArrayOutputStream(data.length);) {
final Deflater deflater = new Deflater();
deflater.setInput(data);
deflater.finish();
final byte[] buffer = new byte[1024];
while (!deflater.finished()) {
final int count = deflater.deflate(buffer);
outputStream.write(buffer, 0, count);
}
final byte[] output = outputStream.toByteArray();
return output;
} catch (IOException e) {
throw new IllegalStateException(e);
}
}
压缩/解压方法逻辑错误;我在实现方面并不深入,但通过调试我发现了以下内容:
压缩32752字节的缓冲区时,deflater.deflate()
方法returns值为32767,这是你在行中初始化缓冲区的大小:
final byte[] bytesCompressed = new byte[Short.MAX_VALUE];
如果您将缓冲区大小增加到
final byte[] bytesCompressed = new byte[4 * Short.MAX_VALUE];
你会看到,32752 字节的输入实际上被缩减为 32768 字节。所以在你的代码中,压缩数据不包含应该在那里的所有数据。
然后当您尝试解压缩时,inflater.inflate()
方法 returns 零表示需要更多输入数据。但是因为你只检查 inflater.finished()
你会陷入无限循环。
因此您可以增加压缩时的缓冲区大小,但这可能只是意味着更大的文件有问题,或者您最好需要重写 compress/decompress 逻辑以分块处理数据。