LEN/CRC/DATA 应该按什么顺序放在消息中？ CRC 是否应该保护 LEN 字段？

Question

Xz format inadequate for long-term archiving中有第 (2.5) 节：

According to Koopman (p. 50), one of the "Seven Deadly Sins" (i.e., bad ideas) of CRC and checksum use is failing to protect a message length field. This causes vulnerabilities due to framing errors. Note that the effects of a framing error in a data stream are more serious than what Figure 1 suggests. Not only data at a random position are interpreted as the CRC. Whatever data that follow the bogus CRC will be interpreted as the beginning of the following field, preventing the successful decoding of any remaining data in the stream.

他说到这个案例，当时留言是这样的：

ID LEN DATA CRC

如果LEN损坏，则使用随机位置的CRC。但我不明白，为什么这是一个问题。在那个随机位置，几乎肯定会有无效的 CRC 值，因此检测到错误。

他谈到解码以下数据。我看不出，如果 LEN 受到保护，如何能够解码以下数据。如果LEN损坏，两种情况都找不到下一条消息。

例如，PNG 不保护长度字段。

那么，当 LEN 字段受 CRC 保护时，为什么它明显更好？

如果我要设计消息结构，最好的方法是什么？我应该使用什么顺序，我应该用 CRC 保护什么？假设消息有以下部分：

消息类型 ID（可变长度整数）
消息长度（可变长度整数）
CRC
消息数据本身

我目前的设计是这样的：

CRC，保护整个消息
消息类型 ID（可变长度整数）
消息长度（可变长度整数）
消息数据本身

这种方法有什么缺点吗？

Answer 1

库普曼 (here) 实际说的是：

Failing to protect message length field Results in pointing to data as FCS, giving HD=1

HD 是汉明距离，这意味着如果您将部分数据视为（虚假）检查值而不是实际校验值。要真正做到正确，您应该在数据之前用自己的校验值保护长度字段和其他 header 值。

至于您的设计，将 CRC 放在首位的缺点是必须先缓冲所有消息以计算 CRC，然后才能将消息写入流中。你可以做 type id, length, header crc, message, message crc.

LEN/CRC/DATA 应该按什么顺序放在消息中？ CRC 是否应该保护 LEN 字段？

In which order should I put LEN/CRC/DATA in a message? Should CRC protect the LEN field?

crc

data-integrity