树哈希:如何验证一个范围是否是树哈希对齐的?
Tree Hash: How to verify if a range is tree-hash-aligned?
"Tree Hash" 是一个类似于 Merkle Tree/Tiger 哈希树的概念,Amazon Glacier 使用它来验证给定数据流的子集的数据完整性。
为了在检索数据时从 Amazon Glacier 接收树哈希,指定的字节范围必须是 "tree hash aligned"。
The concept of "tree hash aligned" is described here.
引用开发者文档:
A range [A, B] is tree-hash aligned with respect to an archive if and only if when a new tree hash is built over [A, B], the root of the tree hash of that range is equivalent to a node in the tree hash of the whole archive. [...]
Consider [P, Q) as the range query for an archive of N megabytes (MB) and P and Q are multiples of one MB. Note that the actual inclusive range is [P MB, Q MB – 1 byte], but for simplicity, we show it as [P, Q). With these considerations, then
- If P is an odd number, there is only one possible tree-hash aligned range—that is [P, P + 1 MB).
- If P is an even number and k is the maximum number, where P can be written as 2k * X, then there are at most k tree-hash aligned ranges that start with P. X is an integer greater than 0. The tree-hash aligned ranges fall in the following categories:
- For each i, where (0 <= i <= k) and where P + 2i < N, then [P, Q + 2i) is a tree-hash aligned range.
- P = 0 is the special case where A = 2[lgN]*0
现在的问题是:如果给定范围 [startByte, endByte] 是树哈希对齐的,我如何以编程方式验证?编程语言无所谓。
测试用例:
[0,0) => true
[0,1) => true
[0,2) => false
[0,3) => true
[1,2) => false
[4,5) => true
这里是 basic 实现 is_treehash_aligned 函数 Python:
import math
def max_k(x):
return 1 + max_k(x/2) if x % 2 == 0 else 0
def is_treehash_aligned(P, Q):
if (Q < P):
return False
elif (P % 2 == 1):
return Q == P
else:
ilen = Q - P + 1 # size(interval)
if not (((ilen & (ilen - 1)) == 0) and ilen != 0):
return False # size(interval) ~ not power of two
if P == 0:
return True
else:
k = max_k(P)
i = int(math.log(ilen, 2))
return i <= k
if (__name__ == "__main__"):
ranges = [(0, 0), (0, 1), (0, 2), (0, 3), (1, 2), \
(4, 5), (6, 7), (2, 4), (6, 8), (5, 6), \
(4, 4), (1, 1), (4194304, 5242879), \
(4194304, 5242880), (4194304, 5242881)]
for r in ranges:
ret = is_treehash_aligned(*r)
print("[" + str(r[0]) + ", " + str(r[1]) + ") => " + str(ret))
输出为:
[0, 0) => True
[0, 1) => True
[0, 2) => False
[0, 3) => True
[1, 2) => False
[4, 5) => True
[6, 7) => True
[2, 4) => False
[6, 8) => False
[5, 6) => False
[4, 4) => True
[1, 1) => True
[4194304, 5242879) => True
[4194304, 5242880) => False
[4194304, 5242881) => False
注意:
- 我采用了你的间隔表示法,而不是说明提供的间隔。因此,可以假设每个间隔是 兆字节对齐的.
- 测试用例的结果
[4194304, 5242880)
与您在原始问题中提出的不同,尽管我仔细检查了它并且我有点相信它是正确的。
- 如果
N
已知,但在您的测试用例中不是这种情况,那么当 P == 0
也应该接受任何范围 s.t。 Q >= floor(N)
,而且不仅是大小为 2 的幂 的那些。对于 子树 也可以进行类似的论证, 右边 没有其他子树。这两种情况都会匹配给定 here 的 树哈希对齐 的 定义 ,但不匹配 说明 用于识别它。
注释:问题和问题的description似乎都令人困惑。
测试用例用符号 [A, B)
给出,其中 A
是起始块的索引,B
是结束块的索引 (included),假设整个档案由一个数组组成--索引从0-- of N 每个块大小 1 MB(可能最后一个除外)。例如:
[0,0) => true
[0,1) => true
[0,2) => false
[0,3) => true
[1,2) => false
[4,5) => true
但是,说明 假设范围是用符号 [P MB, Q MB – 1 byte]
.
给出的
说明 具有误导性。
例如,这里写着:
If P is an even number and k is the maximum number, where P can be written as 2k * X, then there are at most k tree-hash aligned ranges that start with P
power 符号似乎被省略了,可能是由于错误的 HTML 代码,因为句子应该是 "the largest k
s.t. P = (2^k)*X
"。
另一个例子是:
For each i, where (0 <= i <= k) and where P + 2i < N, then [P, Q + 2i) is a tree-hash aligned range.
例如假设 Q = P + 1
、i > 0
和 k > 0
。那么区间 [P, Q + 2^i)
的大小为 = Q + 2^i - P = P + 1 + 2^i - P = 2^i + 1 > 1
。但是,根据构造,不存在这种 tree-hash 对齐的范围,其奇数大小大于 1。命题应该是:"[...],那么[P, P + 2^i)
就是一个树哈希对齐的范围".
"Tree Hash" 是一个类似于 Merkle Tree/Tiger 哈希树的概念,Amazon Glacier 使用它来验证给定数据流的子集的数据完整性。
为了在检索数据时从 Amazon Glacier 接收树哈希,指定的字节范围必须是 "tree hash aligned"。
The concept of "tree hash aligned" is described here.
引用开发者文档:
A range [A, B] is tree-hash aligned with respect to an archive if and only if when a new tree hash is built over [A, B], the root of the tree hash of that range is equivalent to a node in the tree hash of the whole archive. [...]
Consider [P, Q) as the range query for an archive of N megabytes (MB) and P and Q are multiples of one MB. Note that the actual inclusive range is [P MB, Q MB – 1 byte], but for simplicity, we show it as [P, Q). With these considerations, then
- If P is an odd number, there is only one possible tree-hash aligned range—that is [P, P + 1 MB).
- If P is an even number and k is the maximum number, where P can be written as 2k * X, then there are at most k tree-hash aligned ranges that start with P. X is an integer greater than 0. The tree-hash aligned ranges fall in the following categories:
- For each i, where (0 <= i <= k) and where P + 2i < N, then [P, Q + 2i) is a tree-hash aligned range.
- P = 0 is the special case where A = 2[lgN]*0
现在的问题是:如果给定范围 [startByte, endByte] 是树哈希对齐的,我如何以编程方式验证?编程语言无所谓。
测试用例:
[0,0) => true
[0,1) => true
[0,2) => false
[0,3) => true
[1,2) => false
[4,5) => true
这里是 basic 实现 is_treehash_aligned 函数 Python:
import math
def max_k(x):
return 1 + max_k(x/2) if x % 2 == 0 else 0
def is_treehash_aligned(P, Q):
if (Q < P):
return False
elif (P % 2 == 1):
return Q == P
else:
ilen = Q - P + 1 # size(interval)
if not (((ilen & (ilen - 1)) == 0) and ilen != 0):
return False # size(interval) ~ not power of two
if P == 0:
return True
else:
k = max_k(P)
i = int(math.log(ilen, 2))
return i <= k
if (__name__ == "__main__"):
ranges = [(0, 0), (0, 1), (0, 2), (0, 3), (1, 2), \
(4, 5), (6, 7), (2, 4), (6, 8), (5, 6), \
(4, 4), (1, 1), (4194304, 5242879), \
(4194304, 5242880), (4194304, 5242881)]
for r in ranges:
ret = is_treehash_aligned(*r)
print("[" + str(r[0]) + ", " + str(r[1]) + ") => " + str(ret))
输出为:
[0, 0) => True
[0, 1) => True
[0, 2) => False
[0, 3) => True
[1, 2) => False
[4, 5) => True
[6, 7) => True
[2, 4) => False
[6, 8) => False
[5, 6) => False
[4, 4) => True
[1, 1) => True
[4194304, 5242879) => True
[4194304, 5242880) => False
[4194304, 5242881) => False
注意:
- 我采用了你的间隔表示法,而不是说明提供的间隔。因此,可以假设每个间隔是 兆字节对齐的.
- 测试用例的结果
[4194304, 5242880)
与您在原始问题中提出的不同,尽管我仔细检查了它并且我有点相信它是正确的。 - 如果
N
已知,但在您的测试用例中不是这种情况,那么当P == 0
也应该接受任何范围 s.t。Q >= floor(N)
,而且不仅是大小为 2 的幂 的那些。对于 子树 也可以进行类似的论证, 右边 没有其他子树。这两种情况都会匹配给定 here 的 树哈希对齐 的 定义 ,但不匹配 说明 用于识别它。
注释:问题和问题的description似乎都令人困惑。
测试用例用符号
[A, B)
给出,其中A
是起始块的索引,B
是结束块的索引 (included),假设整个档案由一个数组组成--索引从0-- of N 每个块大小 1 MB(可能最后一个除外)。例如:[0,0) => true [0,1) => true [0,2) => false [0,3) => true [1,2) => false [4,5) => true
但是,说明 假设范围是用符号
[P MB, Q MB – 1 byte]
. 给出的
说明 具有误导性。
例如,这里写着:
If P is an even number and k is the maximum number, where P can be written as 2k * X, then there are at most k tree-hash aligned ranges that start with P
power 符号似乎被省略了,可能是由于错误的 HTML 代码,因为句子应该是 "the largest
k
s.t.P = (2^k)*X
"。另一个例子是:
For each i, where (0 <= i <= k) and where P + 2i < N, then [P, Q + 2i) is a tree-hash aligned range.
例如假设
Q = P + 1
、i > 0
和k > 0
。那么区间[P, Q + 2^i)
的大小为= Q + 2^i - P = P + 1 + 2^i - P = 2^i + 1 > 1
。但是,根据构造,不存在这种 tree-hash 对齐的范围,其奇数大小大于 1。命题应该是:"[...],那么[P, P + 2^i)
就是一个树哈希对齐的范围".