在 Python 中重定向类似文件的流的推荐方法？

Question

我正在为间歇性更改的 sqlite 数据库编写备份脚本。现在是这样的：

from bz2 import BZ2File
from datetime import datetime
from os.path import dirname, abspath, join
from hashlib import sha512
def backup_target_database(target):
    backup_dir = dirname(abspath(target))
    hash_file = join(backup_dir, 'last_hash')
    new_hash = sha512(open(target, 'rb').read()).digest()
    if new_hash != open(hash_file, 'rb').read():
        fmt = '%Y%m%d-%H%M.sqlite3.bz2'
        snapshot_file = join(backup_dir, datetime.now().strftime(fmt))
        BZ2File(snapshot_file, 'wb').write(open(target, 'rb').read())
        open(hash_file, 'wb').write(new_hash)

目前数据库的重量只有不到 20MB，所以当它运行并将整个文件读入内存时并没有那么费力（当检测到更改时执行两次），但我不想等到这个成为一个问题。

进行这种（使用 Bashscript 术语）流管道的正确方法是什么？

Answer 1

首先，您的代码中存在重复（读取 target 文件两次）。

并且您可以使用 shutil.copyfileobj and hashlib.update 进行内存高效例程。

from bz2 import BZ2File
from datetime import datetime
from hashlib import sha512
from os.path import dirname, abspath, join
from shutil import copyfileobj

def backup_target_database(target_path):
    backup_dir = dirname(abspath(target_path))
    hash_path = join(backup_dir, 'last_hash')
    old_hash = open(hash_path, 'rb').read()
    hasher = sha512()
    with open(target_path, 'rb') as target:
        while True:
            data = target.read(1024)
            if not data:
                break
            hasher.update(data)
        new_hash = hasher.digest()
    if new_hash != old_hash:
        fmt = '%Y%m%d-%H%M.sqlite3.bz2'
        snapshot_path = join(backup_dir, datetime.now().strftime(fmt))
        with open(target_path, 'rb') as target:
            with BZ2File(snapshot_path, 'wb', compresslevel=9) as snapshot:
                copyfileobj(target, snapshot)

（注意：我没有测试这段代码。如果有问题请通知我）

在 Python 中重定向类似文件的流的推荐方法？

Reccommended way to redirect file-like streams in Python?

python

io

stream

io-redirection

python-3.x