为什么我的 sha256 校验和与 aws glacier 校验和响应不兼容?

Why is my sha256 checksum incompatible with aws glacier checksum response?

我在 ubuntu 服务器中有一个存档文件。我使用 aws cli 在 AWS 冰川中上传了这个文件。最后,AWS 给了我这样的校验和:

{"checksum": "6c126443c882b8b0be912c91617a5765050d7c99dc43b9d30e47c42635ab02d5"}

但是当我像这样在自己的服务器中检查校验和时:

sunny@server:~/sha256sum backup.zip

return 这个校验和:

5ba29292a350c4a8f194c78dd0ef537ec21ca075f1fe649ae6296c7100b25ba8

为什么校验和之间有差异?

虽然 Glacier 返回的校验和使用 SHA-256,但它并不是整个对象的简单 SHA-256 和。相反,它为每一兆字节的数据计算哈希值,并为每一对哈希值计算一个哈希值,并重复该过程直到剩下一个哈希值。有关详细信息,请参阅 documentation.

这是 Python

中的一个简单实现
#!/usr/bin/env python3
import hashlib
import sys
import binascii

# Given a file object (opened in binary mode), calculate the checksum used by glacier
def calc_hash_tree(fileobj):
    chunk_size = 1048576

    # Calculate a list of hashes for each chunk in the fileobj
    chunks = []
    while True:
        chunk = f.read(chunk_size)
        if len(chunk) == 0:
            break
        chunks.append(hashlib.sha256(chunk).digest())
    
    # Now calculate each level of the tree till one digest remains
    while len(chunks) > 1:
        next_chunks = []
        while len(chunks) > 1:
            next_chunks.append(hashlib.sha256(chunks.pop(0) + chunks.pop(0)).digest())
        if len(chunks) > 0:
            next_chunks.append(chunks.pop(0))
        chunks = next_chunks

    # The final remaining hash is the root of the tree:
    return binascii.hexlify(chunks[0]).decode("utf-8")

if __name__ == "__main__":
    with open(sys.argv[1], "rb") as f:
        print(calc_hash_tree(f))

您可以像这样在单个文件上调用它:

$ ./glacier_checksum.py backup.zip