节点 zlib 增量膨胀

Node zlib incremental inflate

我在

的大型 zip 文件的下载流中找到了 local file header 的末尾

并且现在想使用 Node zlib 扩充该数据,但我无法弄清楚如何将数据馈送到 zlib 并接收反馈告诉我 deflate 流何时自行终止。

Node 的 zlib 库是否支持使用 deflate 数据块并return生成一个结果让调用者知道 deflate 流何时结束?

或者这是一件疯狂的事情,因为这意味着我在 UI 线程上膨胀,而我真正应该做的是保存下载的文件并在下载后使用 NPM 包?嗯..好吧..要么网络比inflation快,在这种情况下流式传输inflation会减慢网络速度(糟糕)或者网络比流式传输慢inflation那么为什么在流式传输时放气(我不知道该怎么做)当我可以简单地保存到磁盘和 reload-deflate 而我坐在那里等待网络时..

不过,对于我的启发,我仍然想知道 Node 是否支持流式处理 inflation。

var zlib = require('zlib')
var data = bufferOfChunkOfDeflatedData
var inflate = zlib.createInflate();
var stream = inflate.pipe(fs.createWriteStream(path));
var result = stream.write(data);
// but result doesn't indicate if the inflate stream has terminated...

描述放气 headers 以及它们如何编码流的长度: https://www.bolet.org/~pornin/deflate-flush-fr.html


在内存流中: https://www.npmjs.com/package/memory-streams


好吧,这家伙一直拉到他的魔法签名! :) https://github.com/EvanOxfeld/node-unzip/blob/5a62ecbcef6523708bb8b37decaf6e41728ac7fc/lib/parse.js#L152


配置便捷方法的节点代码: https://github.com/nodejs/node/blob/6e56771f2a9707ddf769358a4338224296a6b5fe/lib/zlib.js#L83 具体来说:https://nodejs.org/api/zlib.html#zlib_zlib_inflateraw_buffer_options_callback


嗯,看起来节点设置为 return 解压缓冲区作为回调的一个块;节点看起来不像是设置为找出放气流的结尾。

https://nodejs.org/api/stream.html#stream_transform_transform_chunk_encoding_callback says The callback function must be called only when the current chunk is completely consumed. and here's the spot where it passes the chunk to zlib https://github.com/nodejs/node/blob/6e56771f2a9707ddf769358a4338224296a6b5fe/lib/zlib.js#L358。所以没有机会说流被部分消耗了..


但话又说回来... https://github.com/ZJONSSON/node-unzipper/blob/affbf89b54b121e85dcd31adf7b1dfde58afebb7/lib/parse.js#L161 but not really. Also just checks for the magic sig: https://github.com/ZJONSSON/node-unzipper/blob/affbf89b54b121e85dcd31adf7b1dfde58afebb7/lib/parse.js#L153


并且从 zip 规格:

4.3.9.3 Although not originally assigned a signature, the value 0x08074b50 has commonly been adopted as a signature value for the data descriptor record. Implementers SHOULD be aware that ZIP files MAY be encountered with or without this signature marking data descriptors and SHOULD account for either case when reading ZIP files to ensure compatibility.

所以看起来每个人都只是在寻找信号。


马克说那是 no-no...所以不要那样做。并且要知道,如果您使用 NPM 库进行解压缩,那么很有可能是该库正在执行此操作。我认为,要做到这一点,需要从 zlib API 文档中获取:https://zlib.net/manual.html

The Z_BLOCK option assists in appending to or combining deflate streams. To assist in this, on return inflate() always sets strm->data_type to the number of unused bits in the last byte taken from strm->next_in, plus 64 if inflate() is currently decoding the last block in the deflate stream, plus 128 if inflate() returned immediately after decoding an end-of-block code or decoding the complete header up to just before the first byte of the deflate stream. The end-of-block will not be indicated until all of the uncompressed data from that block has been written to strm->next_out. The number of unused bits may in general be greater than seven, except when bit 7 of data_type is set, in which case the number of unused bits will be less than eight. data_type is set as noted here every time inflate() returns for all flush options, and so can be used to determine the amount of currently consumed input in bits.

这似乎表明最终的压缩位将不是字节对齐的。然而 ZIP 规范似乎表明 header 以 magic sig 开头,每个人都在使用但不应该使用的是字节对齐的:https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

4.3.9.1 This descriptor MUST exist if bit 3 of the general purpose bit flag is set (see below). It is byte aligned and immediately follows the last byte of compressed data. This descriptor SHOULD be used only when it was not possible to seek in the output .ZIP file, e.g., when the output .ZIP file was standard output or a non-seekable device. For ZIP64(tm) format archives, the compressed and uncompressed sizes are 8 bytes each.

为什么deflate流的末尾不是字节对齐的,而后面的数据描述符是字节对齐的?

有没有好的参考实现?


使用 Inflate with Z_BLOCK 的参考实现:https://github.com/madler/zlib/blob/master/examples/gzappend.c


这家伙倒着翻目录拉出来:https://github.com/antelle/node-stream-zip/blob/907c8876e8aeed6c33a668bbd06a0f79e7a022ef/node_stream_zip.js#L180有这个必要吗?

这家伙似乎认为如果不读取整个文件以进入目录就不能解压缩 zip: https://www.npmjs.com/package/yauzl#no-streaming-unzip-api

我不明白为什么会这样。流描述了它们的长度...并且 Mark 验证它们可以被流式传输。


here 是 Node.js 检查 Z_STREAM_END 的地方!

看起来确实如此,因为文档将 zlib.constants.Z_STREAM_END 列为可能的 return 值。