手动穿过放气
walking through deflate by hand
我正在尝试了解 deflate 的工作原理。也就是说,我想我会尝试根据 RFC1951: DEFLATE Compressed Data Format Specification version 1.3 所说的内容手动解码压缩编码的字符串。我这样生成了 deflate 编码字符串:
$result = gzdeflate('A_DEAD_DAD_CEDED_A_BAD_BABE_A_BEADED_ABACA_BED');
echo bin2hex($result);
这给了我这个:
1589c11100000cc166a3cc61ff2dca237709880c45e52c2b08eb043dedb78db8851e
来自 RFC1951 § 3.2.3。区块格式详情:
Each block of compressed data begins with 3 header bits
containing the following data:
first bit BFINAL
next 2 bits BTYPE
...
BFINAL is set if and only if this is the last block of the data
set.
BTYPE specifies how the data are compressed, as follows:
00 - no compression
01 - compressed with fixed Huffman codes
10 - compressed with dynamic Huffman codes
11 - reserved (error)
0x15 是 0b00010101 位。前 3 位是 000,这将使 BFINAL 0 和 BTYPE 00(无压缩)。
来自稍后的 RFC1951 § 3.2.3。区块格式详情:
if stored with no compression
skip any remaining bits in current partially
processed byte
read LEN and NLEN (see next section)
copy LEN bytes of data to output
来自 RFC1951 § 3.2.4。非压缩块 (BTYPE=00):
Any bits of input up to the next byte boundary are ignored.
The rest of the block consists of the following information:
0 1 2 3 4...
+---+---+---+---+================================+
| LEN | NLEN |... LEN bytes of literal data...|
+---+---+---+---+================================+
LEN is the number of data bytes in the block. NLEN is the
one's complement of LEN.
因此,我移至下一个字节 - 0x89,即位为 0b10001001。前两位 (LEN) 是 10,接下来的两位是 NLEN (00)。但问题就在这里。 00 不是 10 的补码 - 01 是。
此外,如果您想要一个以 0xFF 作为第一个字节的未压缩块怎么办?这不可能吗?
您没有阅读 RFC 的这一部分:
Data elements are packed into bytes in order of increasing bit number
within the byte, i.e., starting with the least-significant bit of the
byte.
前三位是101
,不是000
。
使用infgen对该流进行解码:
! infgen 2.6 output
!
last
dynamic
count 259 10 16
code 17 4
code 18 3
code 0 4
code 4 3
code 3 1
code 2 3
zeros 65
lens 3 3 4 3 3
zeros 25
lens 3
zeros 138
zeros 22
lens 4 3 3
zeros 3
lens 2 0 0 2 2 3 3
! litlen 65 3
! litlen 66 3
! litlen 67 4
! litlen 68 3
! litlen 69 3
! litlen 95 3
! litlen 256 4
! litlen 257 3
! litlen 258 3
! dist 3 2
! dist 6 2
! dist 7 2
! dist 8 3
! dist 9 3
literal 'A_DEAD_D
match 3 4
literal 'CEDED_A_B
match 3 12
literal 'BABE
match 4 11
match 3 28
match 4 20
literal 'BAC
match 4 13
literal 'D
end
我正在尝试了解 deflate 的工作原理。也就是说,我想我会尝试根据 RFC1951: DEFLATE Compressed Data Format Specification version 1.3 所说的内容手动解码压缩编码的字符串。我这样生成了 deflate 编码字符串:
$result = gzdeflate('A_DEAD_DAD_CEDED_A_BAD_BABE_A_BEADED_ABACA_BED');
echo bin2hex($result);
这给了我这个:
1589c11100000cc166a3cc61ff2dca237709880c45e52c2b08eb043dedb78db8851e
来自 RFC1951 § 3.2.3。区块格式详情:
Each block of compressed data begins with 3 header bits
containing the following data:
first bit BFINAL
next 2 bits BTYPE
...
BFINAL is set if and only if this is the last block of the data
set.
BTYPE specifies how the data are compressed, as follows:
00 - no compression
01 - compressed with fixed Huffman codes
10 - compressed with dynamic Huffman codes
11 - reserved (error)
0x15 是 0b00010101 位。前 3 位是 000,这将使 BFINAL 0 和 BTYPE 00(无压缩)。
来自稍后的 RFC1951 § 3.2.3。区块格式详情:
if stored with no compression
skip any remaining bits in current partially
processed byte
read LEN and NLEN (see next section)
copy LEN bytes of data to output
来自 RFC1951 § 3.2.4。非压缩块 (BTYPE=00):
Any bits of input up to the next byte boundary are ignored.
The rest of the block consists of the following information:
0 1 2 3 4...
+---+---+---+---+================================+
| LEN | NLEN |... LEN bytes of literal data...|
+---+---+---+---+================================+
LEN is the number of data bytes in the block. NLEN is the
one's complement of LEN.
因此,我移至下一个字节 - 0x89,即位为 0b10001001。前两位 (LEN) 是 10,接下来的两位是 NLEN (00)。但问题就在这里。 00 不是 10 的补码 - 01 是。
此外,如果您想要一个以 0xFF 作为第一个字节的未压缩块怎么办?这不可能吗?
您没有阅读 RFC 的这一部分:
Data elements are packed into bytes in order of increasing bit number within the byte, i.e., starting with the least-significant bit of the byte.
前三位是101
,不是000
。
使用infgen对该流进行解码:
! infgen 2.6 output
!
last
dynamic
count 259 10 16
code 17 4
code 18 3
code 0 4
code 4 3
code 3 1
code 2 3
zeros 65
lens 3 3 4 3 3
zeros 25
lens 3
zeros 138
zeros 22
lens 4 3 3
zeros 3
lens 2 0 0 2 2 3 3
! litlen 65 3
! litlen 66 3
! litlen 67 4
! litlen 68 3
! litlen 69 3
! litlen 95 3
! litlen 256 4
! litlen 257 3
! litlen 258 3
! dist 3 2
! dist 6 2
! dist 7 2
! dist 8 3
! dist 9 3
literal 'A_DEAD_D
match 3 4
literal 'CEDED_A_B
match 3 12
literal 'BABE
match 4 11
match 3 28
match 4 20
literal 'BAC
match 4 13
literal 'D
end