Fletchers16校验和适合小数据吗？

Question

使用 wikipedia Fletcher's checksum 上的直接实现，我们得到了 "BCA" 和 "CAB" 以及 "BAC" 和 "ACB" 等数据的相同校验和.

这是预期的吗？
Fletcher16 校验和不应该考虑块的顺序吗？

如以下代码所示，可以通过将索引与数据进行“或”运算来轻松修复缺陷....

uint16_t fletcher16( uint8_t *data, int count )
{
   uint16_t sum1 = 0;
   uint16_t sum2 = 0;
   int index;

   for( index = 0; index < count; ++index )
   {
      //sum1 = (sum1 + data[index]) % 255; // Original
      sum1 = (sum1 + index | data[index]) % 255; // The "fix"
      sum2 = (sum2 + sum1) % 255;
   }

   return (sum2 << 8) | sum1;
}

Answer 1

... we get the same checksum... Is this expected?

是的，因为这是可能的。校验和是 16 位的（511 组合永远不会出现：0x..FF、0xFF..）所以 3 个字符串 24 位肯定会发生冲突。 Pigeonhole principle

Should not the Fletcher16 checksum account for the order of the blocks?

确实如此。只是该算法很容易与 select 相似的输入发生冲突。另见 Hamming distance

顺便说一句，如果使用字符串的长度或大小（还要检查 空字符 ），原始算法会给出不同的结果。此外，4 个输入字符串给出了不同的结果对。

  printf("%x\n", fletcher16("BCA",3)); // 8ec6
  printf("%x\n", fletcher16("CAB",3)); // 8ec6 same
  printf("%x\n", fletcher16("BAC",3)); // 8cc6 
  printf("%x\n", fletcher16("ACB",3)); // 8cc6 same

  printf("%x\n", fletcher16("BCA",4)); // 55c6
  printf("%x\n", fletcher16("CAB",4)); // 55c6 same
  printf("%x\n", fletcher16("BAC",4)); // 53c6
  printf("%x\n", fletcher16("ACB",4)); // 53c6 same

OP 的建议改进削弱了校验和，也通过索引中的 or-ing 忽略了每个阶段的 select 位。建议改为异或或添加。

轻微尼特：

// return (sum2 << 8) | sum1;
return (1u*sum2 << 8) | sum1;

此更改对所有 int/unsigned 大小都没有影响，但在 int/unsigned 为 16 位时避免了实现定义的行为。最好确保代码不会左移到符号位。

some_int % 255 执行有符号余数。在设备上，例如简单的嵌入式设备，无符号余数肯定同样快或更快。 % 255u 不会有任何损失，但可能会有所改进。

[编辑]

尽管 OP 有 "fixed" 短字符串代码，但它违背了 fletcher16() 的设计参数，即执行速度。

详情：如果我们抛开%255，sum1是data[0] + ... + data[count-1])，sum2是data[0]*(count) + data[0]*(count-1) + ... + data[count-1]*(1)，创建1就变得容易了， 2,3 等具有低值的长字符串，如果有的话，很少发生 %255 操作。

请注意，sum2 是根据顺序有效创建不同校验和的部分。如果数组元素的总和永远不会达到 255（这发生在 OP 的 4 个测试用例中），sum1 对于仅顺序不同的任何 2 个字符串都是相同的。

为了有效地 "mix up/hash" 具有低值的短字符串，需要不同的算法。

也许只在 count < 8 时使用变体：

sum1 = (sum1 + index + data[index]) % 255;

Fletchers16校验和适合小数据吗？

Is Fletchers16 checksum suitable for small data?

c

checksum