Python 在内存中创建的 ZipFile 未按预期进行压缩

Python ZipFile Created In-Memory Not Compressing as Expected

我正在尝试使用 Python 在内存中创建 ZipFile 对象,并将同样在内存中创建的单个文件写入 ZipFile 对象,然后将文件上传到 Google 云存储。

我的文件实际上并没有被压缩。知道我可能做错了什么吗?

我意识到可能有一种更奇特的方法可以将行数据放入文件对象中,但除此之外,我真的只是想弄清楚为什么生成的 zip 文件根本没有压缩出来。

更新:代码示例现在排除了与 Google 云服务(GCS 等)的任何交互,而只是将文件写入磁盘。

好像我先把文件写入磁盘,然后创建ZipFile,结果按预期压缩了,但是当我直接从内存中将StringIO内容添加到ZipFile对象时,内容并没有被压缩。

import random, io, argparse, os, string
from zipfile import ZipFile, ZipInfo, ZIP_DEFLATED

parser = argparse.ArgumentParser()
parser.add_argument("--row_limit", default=1000)
parser.add_argument("--file_name", default='file.txt', type=str)
parser.add_argument("--archive_name", default='file.zip', type=str)
parser.add_argument("--snapshot_millis", default=0, type=int)
args = parser.parse_args()

# imagine this has lots and lots of data in it, coming from a database query result
rows = [{
    'seq_no': ''.join(random.choices(string.ascii_uppercase + string.digits, k=args.row_limit)),
    'csv': ''.join(random.choices(string.ascii_uppercase + string.digits, k=args.row_limit))
}] * args.row_limit

archive = io.BytesIO()
# create zip archive in memory
with ZipFile(archive, 'w', compression=ZIP_DEFLATED, compresslevel=9) as zip_archive:
    count = 0
    file_contents = io.StringIO()
    for row in rows:
        if count > args.row_limit:
            break
        count += 1
        file_contents.write(f"{row['seq_no']},{row['csv']}\n")

    # write file to zip archive in memory
    zip_file = ZipInfo(args.file_name)
    zip_archive.writestr(zip_file, file_contents.getvalue())

    # also write file to disk
    with open(args.file_name, mode='w') as f:
        print(file_contents.getvalue(), file=f)

    print(f"StringIO Size: {file_contents.tell()}")
    print(f"Text File Size On Disk: {os.path.getsize(args.file_name)}")

archive.seek(0)

with open(args.archive_name, 'wb') as outfile:
    outfile.write(archive.getbuffer())

print(f"Zip File Created from File In Memory: {os.path.getsize(args.archive_name)}")

ZipFile(args.archive_name, mode='w', compression=ZIP_DEFLATED, compresslevel=9).write(args.file_name)

print(f"Zip File Created from File On Disk: {os.path.getsize(args.archive_name)}")

问题出在这里:

zip_file = ZipInfo(args.file_name)
zip_archive.writestr(zip_file, file_contents.getvalue())

来自ZipFile.writestr docs

When passing a ZipInfo instance as the zinfo_or_arcname parameter, the compression method used will be that specified in the compress_type member of the given ZipInfo instance. By default, the ZipInfo constructor sets this member to ZIP_STORED [i.e. uncompressed].

更正此问题的最简单方法是不使用完整的 ZipInfo,而只使用文件名。这也将当前 date/time 设置为存档内文件的创建时间(ZipInfo 默认为 1980 年):

# zip_file = ZipInfo(args.file_name)
zip_archive.writestr(args.file_name, file_contents.getvalue())