Python - 将字节缓冲区转换为文件大小
Python - Convert bytes buffer to file size
我正在编写一个程序来计算文件列表的校验和,然后将其与参考文件进行比较。
我正在尝试将字节缓冲区从 hashfile
方法转换为与 os.stat(path).st_size
使用相同单位的文件大小,以便我可以相应地更新 tqdm 进度条。 (试图实现最后一个例子here)
我尝试了很多事情(len(buf)
:给我一个处理后的大小远远大于总数,int.from_bytes()
:OverflowError - int 太大而无法转换为 float,struct.unpack_from(buf)
:需要一次读取一个字节,各种转换字节的函数)但到目前为止没有任何效果。看来我对字节的理解还不够,不知道要搜索什么或实施我找到的解决方案。
代码摘录如下:
import hashlib
import os
from tqdm import tqdm
# calculate total size to process
self.assets_size += os.stat(os.path.join(root, f)).st_size
def hashfile(self, progress, afile, hasher, blocksize=65536):
"""
Checksum buffer
:param progress: progress bar object
:param afile: file to process
:param hasher: checksum algorithm
:param blocksize: size of the buffer
:return: hash digest
"""
buf = afile.read(blocksize)
while len(buf) > 0:
self.processed_size += buf # need to convert from bytes to file size
hasher.update(buf)
progress.update(self.processed_size) # tqdm update
buf = afile.read(blocksize)
afile.seek(0)
return hasher.digest()
def process_file(self, progress, fichier):
"""
Checks if the file is in the reference dictionary;
If so, checks if the size of the file matches the one stored in the dictionary;
If so, calculates the checksum of the file and compares it to the one in the dictionary
:param progress: progress bar object
:param fichier: asset file to process
:return: string outcome of the process
"""
checksum = self.hashfile(progress, open(fichier, 'rb'), hashlib.sha1())
# check if checksum matches
return outcome
def main_process(self):
"""
Launches and monitors the process and writes a report of the results
:return: application end
"""
with tqdm(total=self.assets_size, unit='B', unit_scale=True) as pbar:
all_results = []
for f in self.assets.keys():
results = self.process_file(pbar, f)
all_results.append(results)
for r in all_results:
print(r)
感谢@RadosławCybulski 找到了解决方案,我现在了解 tqdm.update() 函数的工作原理:它不会将进度状态设置为参数,而是添加它。我像这样更新了 hashfile 方法:
while len(buf) > 0:
hasher.update(buf)
progress.update(len(buf))
buf = afile.read(blocksize)
我正在编写一个程序来计算文件列表的校验和,然后将其与参考文件进行比较。
我正在尝试将字节缓冲区从 hashfile
方法转换为与 os.stat(path).st_size
使用相同单位的文件大小,以便我可以相应地更新 tqdm 进度条。 (试图实现最后一个例子here)
我尝试了很多事情(len(buf)
:给我一个处理后的大小远远大于总数,int.from_bytes()
:OverflowError - int 太大而无法转换为 float,struct.unpack_from(buf)
:需要一次读取一个字节,各种转换字节的函数)但到目前为止没有任何效果。看来我对字节的理解还不够,不知道要搜索什么或实施我找到的解决方案。
代码摘录如下:
import hashlib
import os
from tqdm import tqdm
# calculate total size to process
self.assets_size += os.stat(os.path.join(root, f)).st_size
def hashfile(self, progress, afile, hasher, blocksize=65536):
"""
Checksum buffer
:param progress: progress bar object
:param afile: file to process
:param hasher: checksum algorithm
:param blocksize: size of the buffer
:return: hash digest
"""
buf = afile.read(blocksize)
while len(buf) > 0:
self.processed_size += buf # need to convert from bytes to file size
hasher.update(buf)
progress.update(self.processed_size) # tqdm update
buf = afile.read(blocksize)
afile.seek(0)
return hasher.digest()
def process_file(self, progress, fichier):
"""
Checks if the file is in the reference dictionary;
If so, checks if the size of the file matches the one stored in the dictionary;
If so, calculates the checksum of the file and compares it to the one in the dictionary
:param progress: progress bar object
:param fichier: asset file to process
:return: string outcome of the process
"""
checksum = self.hashfile(progress, open(fichier, 'rb'), hashlib.sha1())
# check if checksum matches
return outcome
def main_process(self):
"""
Launches and monitors the process and writes a report of the results
:return: application end
"""
with tqdm(total=self.assets_size, unit='B', unit_scale=True) as pbar:
all_results = []
for f in self.assets.keys():
results = self.process_file(pbar, f)
all_results.append(results)
for r in all_results:
print(r)
感谢@RadosławCybulski 找到了解决方案,我现在了解 tqdm.update() 函数的工作原理:它不会将进度状态设置为参数,而是添加它。我像这样更新了 hashfile 方法:
while len(buf) > 0:
hasher.update(buf)
progress.update(len(buf))
buf = afile.read(blocksize)