如何使用 python 中的摘要验证文件的完整性 (SHA256SUMS)

How to verify integrity of files using digest in python (SHA256SUMS)

我有一组文件和一个 SHA256SUMS digest file,每个文件都包含一个 sha256() 散列。使用 python 验证文件完整性的最佳方法是什么?

例如,这是我下载 Debian 10 网络安装程序 SHA256SUMS 摘要文件和 download/verify BASH

中的 MANIFEST 文件的方式
user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
--2020-08-25 02:11:20--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75295 (74K)
Saving to: ‘SHA256SUMS’

SHA256SUMS          100%[===================>]  73.53K  71.7KB/s    in 1.0s    

2020-08-25 02:11:22 (71.7 KB/s) - ‘SHA256SUMS’ saved [75295/75295]

user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
--2020-08-25 02:11:27--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1709 (1.7K)
Saving to: ‘MANIFEST’

MANIFEST            100%[===================>]   1.67K  --.-KB/s    in 0s      

2020-08-25 02:11:28 (128 MB/s) - ‘MANIFEST’ saved [1709/1709]

user@host:~$ sha256sum --check --ignore-missing SHA256SUMS 
./MANIFEST: OK
user@host:~$ 

在 python 中执行相同操作(下载并验证 Debian 10 MANIFEST file using the SHA256SUMS 文件的完整性)的最佳方法是什么?

您可以按照本博客中所述计算每个文件的 sha256sums post:

https://www.quickprogrammingtips.com/python/how-to-calculate-sha256-hash-of-a-file-in-python.html

生成新清单文件的示例实现可能如下所示:

import hashlib
from pathlib import Path

# Your output file
output_file = "manifest-check"

# Your target directory
p = Path('.')

sha256_hash = hashlib.sha256()

with open(output_file, "w") as out:
  # Iterate over the files in the directory
  for f in p.glob("**/*"):
    # Process files only (no subdirs)
    if f.is_file():
      with open(filename,"rb") as f:
      # Read the file by chunks
      for byte_block in iter(lambda: f.read(4096),b""):
        sha256_hash.update(byte_block)
      out.write(f + "\t" + sha256_hash.hexdigest() + "\n")

或者,这似乎是通过 manifest-checker pip 包实现的。

你可以在这里查看它的来源 https://github.com/TonyFlury/manifest-checkerand 调整为 python 3

以下 python 脚本实现了一个名为 integrity_is_ok() 的函数,该函数获取 SHA256SUMS 文件的路径和要验证的文件列表,并且 returns False 如果无法验证任何文件,True 否则。

#!/usr/bin/env python3
from hashlib import sha256
import os

# Takes the path (as a string) to a SHA256SUMS file and a list of paths to
# local files. Returns true only if all files' checksums are present in the
# SHA256SUMS file and their checksums match
def integrity_is_ok( sha256sums_filepath, local_filepaths ):

    # first we parse the SHA256SUMS file and convert it into a dictionary
    sha256sums = dict()
    with open( sha256sums_filepath ) as fd:
        for line in fd:
            # sha256 hashes are exactly 64 characters long
            checksum = line[0:64]

            # there is one space followed by one metadata character between the
            # checksum and the filename in the `sha256sum` command output
            filename = os.path.split( line[66:] )[1].strip()
            sha256sums[filename] = checksum

    # now loop through each file that we were asked to check and confirm its
    # checksum matches what was listed in the SHA256SUMS file
    for local_file in local_filepaths:

        local_filename = os.path.split( local_file )[1]

        sha256sum = sha256()
        with open( local_file, 'rb' ) as fd:
            data_chunk = fd.read(1024)
            while data_chunk:
                sha256sum.update(data_chunk)
                data_chunk = fd.read(1024)

        checksum = sha256sum.hexdigest()
        if checksum != sha256sums[local_filename]:
            return False

    return True

if __name__ == '__main__':

    script_dir = os.path.split( os.path.realpath(__file__) )[0]
    sha256sums_filepath = script_dir + '/SHA256SUMS'
    local_filepaths = [ script_dir + '/MANIFEST' ]

    if integrity_is_ok( sha256sums_filepath, local_filepaths ):
        print( "INFO: Checksum OK" )
    else:
        print( "ERROR: Checksum Invalid" )

这是一个执行示例:

user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
--2020-08-25 22:40:16--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75295 (74K)
Saving to: ‘SHA256SUMS’

SHA256SUMS          100%[===================>]  73.53K   201KB/s    in 0.4s    

2020-08-25 22:40:17 (201 KB/s) - ‘SHA256SUMS’ saved [75295/75295]

user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
--2020-08-25 22:40:32--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1709 (1.7K)
Saving to: ‘MANIFEST’

MANIFEST            100%[===================>]   1.67K  --.-KB/s    in 0s      

2020-08-25 22:40:32 (13.0 MB/s) - ‘MANIFEST’ saved [1709/1709]

user@host:~$ ./sha256sums_python.py 
INFO: Checksum OK
user@host:~$ 

以上代码的部分内容改编自 Ask Ubuntu 上的以下答案: