从 gzip 文件写入未压缩文件的内存有效方法

Question

使用 Python 3.5

我正在解压缩一个 gzip 文件，正在写入另一个文件。在调查内存不足问题后，我在 gzip 模块的文档中找到了一个示例：

import gzip
import shutil
with open('/home/joe/file.txt', 'rb') as f_in:
    with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)

这是压缩，我想解压缩，所以我认为我可以反转模式，给出

with open(unzipped_file, 'wb') as f_out, gzip.open(zipped_file, 'rb') as f_in:
    shutil.copyfileobj(f_in, f_out)

我的问题是，为什么我会遇到以下问题：

with gzip.open(zipped_file, 'rb') as zin, open(unzipped_file, 'wb') as wout:
    wout.write(zin.read())

要么我忍无可忍，要么我天真地认为这些文件会像生成器一样运行并流式传输解压缩过程，占用很少的内存。这两种方法应该等价吗？

Answer 1

这里是shutil.copyfileObj方法。

def copyfileobj(fsrc, fdst, length=16*1024):
    """copy data from file-like object fsrc to file-like object fdst"""
    while 1:
        buf = fsrc.read(length)
        if not buf:
            break
        fdst.write(buf)

它以16*1024长度的块读取文件。当您尝试反转该过程时，您没有考虑将被读入内存并使您陷入内存问题的文件大小。

Answer 2

而不是内存饥饿（和天真）

import gzip
with gzip.open(zipped_file, 'rb') as zin, open(unzipped_file, 'wb') as wout:
     wout.write(zin.read())

根据之前的回答，我测试了这个：

import gzip
block_size = 64*1024
with gzip.open(zipped_file, 'rb') as zin, open(unzipped_file, 'wb') as wout:
while True:
    uncompressed_block = zin.read(block_size)
    if not uncompressed_block:
        break
    wout.write(uncompressed_block)

在 4.8G 文件上验证。

从 gzip 文件写入未压缩文件的内存有效方法

memory efficient way to write an uncompressed file from a gzip file

python

gzip

generator

shutil

python-3.x