Python - 将字节缓冲区转换为文件大小

Python - Convert bytes buffer to file size

我正在编写一个程序来计算文件列表的校验和,然后将其与参考文件进行比较。

我正在尝试将字节缓冲区从 hashfile 方法转换为与 os.stat(path).st_size 使用相同单位的文件大小,以便我可以相应地更新 tqdm 进度条。 (试图实现最后一个例子here

我尝试了很多事情(len(buf):给我一个处理后的大小远远大于总数,int.from_bytes():OverflowError - int 太大而无法转换为 float,struct.unpack_from(buf):需要一次读取一个字节,各种转换字节的函数)但到目前为止没有任何效果。看来我对字节的理解还不够,不知道要搜索什么或实施我找到的解决方案。

代码摘录如下:

import hashlib
import os
from tqdm import tqdm

# calculate total size to process
self.assets_size += os.stat(os.path.join(root, f)).st_size

def hashfile(self, progress, afile, hasher, blocksize=65536):
    """
    Checksum buffer
    :param progress: progress bar object
    :param afile: file to process
    :param hasher: checksum algorithm
    :param blocksize: size of the buffer
    :return: hash digest
    """
    buf = afile.read(blocksize)

    while len(buf) > 0:
        self.processed_size += buf  # need to convert from bytes to file size
        hasher.update(buf)
        progress.update(self.processed_size)  # tqdm update
        buf = afile.read(blocksize)

    afile.seek(0)
    return hasher.digest()

def process_file(self, progress, fichier):
    """
    Checks if the file is in the reference dictionary;
    If so, checks if the size of the file matches the one stored in the dictionary;
    If so, calculates the checksum of the file and compares it to the one in the dictionary
    :param progress: progress bar object
    :param fichier: asset file to process
    :return: string outcome of the process
    """
    checksum = self.hashfile(progress, open(fichier, 'rb'), hashlib.sha1())
    # check if checksum matches
    return outcome

def main_process(self):
    """
    Launches and monitors the process and writes a report of the results
    :return: application end
    """
    with tqdm(total=self.assets_size, unit='B', unit_scale=True) as pbar:
        all_results = []

        for f in self.assets.keys():
            results = self.process_file(pbar, f)
            all_results.append(results)

    for r in all_results:
        print(r)

感谢@RadosławCybulski 找到了解决方案,我现在了解 tqdm.update() 函数的工作原理:它不会将进度状态设置为参数,而是添加它。我像这样更新了 hashfile 方法:

    while len(buf) > 0:
        hasher.update(buf)
        progress.update(len(buf))
        buf = afile.read(blocksize)