手动穿过放气

walking through deflate by hand

我正在尝试了解 deflate 的工作原理。也就是说,我想我会尝试根据 RFC1951: DEFLATE Compressed Data Format Specification version 1.3 所说的内容手动解码压缩编码的字符串。我这样生成了 deflate 编码字符串:

$result = gzdeflate('A_DEAD_DAD_CEDED_A_BAD_BABE_A_BEADED_ABACA_BED');

echo bin2hex($result);

这给了我这个:

1589c11100000cc166a3cc61ff2dca237709880c45e52c2b08eb043dedb78db8851e

来自 RFC1951 § 3.2.3。区块格式详情:

         Each block of compressed data begins with 3 header bits
         containing the following data:

            first bit       BFINAL
            next 2 bits     BTYPE

...

         BFINAL is set if and only if this is the last block of the data
         set.

         BTYPE specifies how the data are compressed, as follows:

            00 - no compression
            01 - compressed with fixed Huffman codes
            10 - compressed with dynamic Huffman codes
            11 - reserved (error)

0x15 是 0b00010101 位。前 3 位是 000,这将使 BFINAL 0 和 BTYPE 00(无压缩)。

来自稍后的 RFC1951 § 3.2.3。区块格式详情:

           if stored with no compression
              skip any remaining bits in current partially
                 processed byte
              read LEN and NLEN (see next section)
              copy LEN bytes of data to output

来自 RFC1951 § 3.2.4。非压缩块 (BTYPE=00):

     Any bits of input up to the next byte boundary are ignored.
     The rest of the block consists of the following information:

          0   1   2   3   4...
        +---+---+---+---+================================+
        |  LEN  | NLEN  |... LEN bytes of literal data...|
        +---+---+---+---+================================+

     LEN is the number of data bytes in the block.  NLEN is the
     one's complement of LEN.

因此,我移至下一个字节 - 0x89,即位为 0b10001001。前两位 (LEN) 是 10,接下来的两位是 NLEN (00)。但问题就在这里。 00 不是 10 的补码 - 01 是。

此外,如果您想要一个以 0xFF 作为第一个字节的未压缩块怎么办?这不可能吗?

您没有阅读 RFC 的这一部分:

Data elements are packed into bytes in order of increasing bit number within the byte, i.e., starting with the least-significant bit of the byte.

前三位是101,不是000

使用infgen对该流进行解码:

! infgen 2.6 output
!
last
dynamic
count 259 10 16
code 17 4
code 18 3
code 0 4
code 4 3
code 3 1
code 2 3
zeros 65
lens 3 3 4 3 3
zeros 25
lens 3
zeros 138
zeros 22
lens 4 3 3
zeros 3
lens 2 0 0 2 2 3 3
! litlen 65 3
! litlen 66 3
! litlen 67 4
! litlen 68 3
! litlen 69 3
! litlen 95 3
! litlen 256 4
! litlen 257 3
! litlen 258 3
! dist 3 2
! dist 6 2
! dist 7 2
! dist 8 3
! dist 9 3
literal 'A_DEAD_D
match 3 4
literal 'CEDED_A_B
match 3 12
literal 'BABE
match 4 11
match 3 28
match 4 20
literal 'BAC
match 4 13
literal 'D
end