树哈希:如何验证一个范围是否是树哈希对齐的?

Tree Hash: How to verify if a range is tree-hash-aligned?

"Tree Hash" 是一个类似于 Merkle Tree/Tiger 哈希树的概念,Amazon Glacier 使用它来验证给定数据流的子集的数据完整性。

为了在检索数据时从 Amazon Glacier 接收树哈希,指定的字节范围必须是 "tree hash aligned"。

The concept of "tree hash aligned" is described here.

引用开发者文档:

A range [A, B] is tree-hash aligned with respect to an archive if and only if when a new tree hash is built over [A, B], the root of the tree hash of that range is equivalent to a node in the tree hash of the whole archive. [...]

Consider [P, Q) as the range query for an archive of N megabytes (MB) and P and Q are multiples of one MB. Note that the actual inclusive range is [P MB, Q MB – 1 byte], but for simplicity, we show it as [P, Q). With these considerations, then

  • If P is an odd number, there is only one possible tree-hash aligned range—that is [P, P + 1 MB).
  • If P is an even number and k is the maximum number, where P can be written as 2k * X, then there are at most k tree-hash aligned ranges that start with P. X is an integer greater than 0. The tree-hash aligned ranges fall in the following categories:
    • For each i, where (0 <= i <= k) and where P + 2i < N, then [P, Q + 2i) is a tree-hash aligned range.
    • P = 0 is the special case where A = 2[lgN]*0

现在的问题是:如果给定范围 [startByte, endByte] 是树哈希对齐的,我如何以编程方式验证?编程语言无所谓。

测试用例:

[0,0) => true
[0,1) => true
[0,2) => false
[0,3) => true
[1,2) => false
[4,5) => true

这里是 basic 实现 is_treehash_aligned 函数 Python:

import math

def max_k(x):
    return 1 + max_k(x/2) if x % 2 == 0 else 0

def is_treehash_aligned(P, Q):

    if (Q < P):
        return False
    elif (P % 2 == 1):
        return Q == P
    else:
        ilen = Q - P + 1    # size(interval)
        if  not (((ilen & (ilen - 1)) == 0) and ilen != 0):
            return False    # size(interval) ~ not power of two
        if P == 0:
            return True
        else:
            k = max_k(P)
            i = int(math.log(ilen, 2))
            return i <= k

if (__name__ == "__main__"):
    ranges = [(0, 0), (0, 1), (0, 2), (0, 3), (1, 2), \
              (4, 5), (6, 7), (2, 4), (6, 8), (5, 6), \
              (4, 4), (1, 1), (4194304, 5242879), \
              (4194304, 5242880), (4194304, 5242881)]

    for r in ranges:
        ret = is_treehash_aligned(*r)
        print("[" + str(r[0]) + ", " + str(r[1]) + ") => " + str(ret))

输出为:

[0, 0) => True
[0, 1) => True
[0, 2) => False
[0, 3) => True
[1, 2) => False
[4, 5) => True
[6, 7) => True
[2, 4) => False
[6, 8) => False
[5, 6) => False
[4, 4) => True
[1, 1) => True
[4194304, 5242879) => True
[4194304, 5242880) => False
[4194304, 5242881) => False

注意:

  • 我采用了你的间隔表示法,而不是说明提供的间隔。因此,可以假设每个间隔是 兆字节对齐的.
  • 测试用例的结果 [4194304, 5242880) 与您在原始问题中提出的不同,尽管我仔细检查了它并且我有点相信它是正确的。
  • 如果 N 已知,但在您的测试用例中不是这种情况,那么当 P == 0 也应该接受任何范围 s.t。 Q >= floor(N),而且不仅是大小为 2 的幂 的那些。对于 子树 也可以进行类似的论证, 右边 没有其他子树。这两种情况都会匹配给定 here 树哈希对齐 定义 ,但不匹配 说明 用于识别它。

注释:问题和问题的description似乎都令人困惑。

  1. 测试用例用符号 [A, B) 给出,其中 A 是起始块的索引,B 是结束块的索引 (included),假设整个档案由一个数组组成--索引从0-- of N 每个块大小 1 MB(可能最后一个除外)。例如:

    [0,0) => true
    [0,1) => true
    [0,2) => false
    [0,3) => true
    [1,2) => false
    [4,5) => true
    

    但是,说明 假设范围是用符号 [P MB, Q MB – 1 byte].

  2. 给出的
  3. 说明 具有误导性

    例如,这里写着:

    If P is an even number and k is the maximum number, where P can be written as 2k * X, then there are at most k tree-hash aligned ranges that start with P

    power 符号似乎被省略了,可能是由于错误的 HTML 代码,因为句子应该是 "the largest k s.t. P = (2^k)*X"

    另一个例子是:

    For each i, where (0 <= i <= k) and where P + 2i < N, then [P, Q + 2i) is a tree-hash aligned range.

    例如假设 Q = P + 1i > 0k > 0。那么区间 [P, Q + 2^i) 的大小为 = Q + 2^i - P = P + 1 + 2^i - P = 2^i + 1 > 1。但是,根据构造,不存在这种 tree-hash 对齐的范围,其奇数大小大于 1。命题应该是:"[...],那么[P, P + 2^i)就是一个树哈希对齐的范围".