Python 在内存中创建的 ZipFile 未按预期进行压缩
Python ZipFile Created In-Memory Not Compressing as Expected
我正在尝试使用 Python 在内存中创建 ZipFile 对象,并将同样在内存中创建的单个文件写入 ZipFile 对象,然后将文件上传到 Google 云存储。
我的文件实际上并没有被压缩。知道我可能做错了什么吗?
我意识到可能有一种更奇特的方法可以将行数据放入文件对象中,但除此之外,我真的只是想弄清楚为什么生成的 zip 文件根本没有压缩出来。
更新:代码示例现在排除了与 Google 云服务(GCS 等)的任何交互,而只是将文件写入磁盘。
好像我先把文件写入磁盘,然后创建ZipFile,结果按预期压缩了,但是当我直接从内存中将StringIO内容添加到ZipFile对象时,内容并没有被压缩。
import random, io, argparse, os, string
from zipfile import ZipFile, ZipInfo, ZIP_DEFLATED
parser = argparse.ArgumentParser()
parser.add_argument("--row_limit", default=1000)
parser.add_argument("--file_name", default='file.txt', type=str)
parser.add_argument("--archive_name", default='file.zip', type=str)
parser.add_argument("--snapshot_millis", default=0, type=int)
args = parser.parse_args()
# imagine this has lots and lots of data in it, coming from a database query result
rows = [{
'seq_no': ''.join(random.choices(string.ascii_uppercase + string.digits, k=args.row_limit)),
'csv': ''.join(random.choices(string.ascii_uppercase + string.digits, k=args.row_limit))
}] * args.row_limit
archive = io.BytesIO()
# create zip archive in memory
with ZipFile(archive, 'w', compression=ZIP_DEFLATED, compresslevel=9) as zip_archive:
count = 0
file_contents = io.StringIO()
for row in rows:
if count > args.row_limit:
break
count += 1
file_contents.write(f"{row['seq_no']},{row['csv']}\n")
# write file to zip archive in memory
zip_file = ZipInfo(args.file_name)
zip_archive.writestr(zip_file, file_contents.getvalue())
# also write file to disk
with open(args.file_name, mode='w') as f:
print(file_contents.getvalue(), file=f)
print(f"StringIO Size: {file_contents.tell()}")
print(f"Text File Size On Disk: {os.path.getsize(args.file_name)}")
archive.seek(0)
with open(args.archive_name, 'wb') as outfile:
outfile.write(archive.getbuffer())
print(f"Zip File Created from File In Memory: {os.path.getsize(args.archive_name)}")
ZipFile(args.archive_name, mode='w', compression=ZIP_DEFLATED, compresslevel=9).write(args.file_name)
print(f"Zip File Created from File On Disk: {os.path.getsize(args.archive_name)}")
问题出在这里:
zip_file = ZipInfo(args.file_name)
zip_archive.writestr(zip_file, file_contents.getvalue())
When passing a ZipInfo instance as the zinfo_or_arcname parameter, the
compression method used will be that specified in the compress_type
member of the given ZipInfo instance. By default, the ZipInfo
constructor sets this member to ZIP_STORED [i.e. uncompressed].
更正此问题的最简单方法是不使用完整的 ZipInfo
,而只使用文件名。这也将当前 date/time 设置为存档内文件的创建时间(ZipInfo
默认为 1980 年):
# zip_file = ZipInfo(args.file_name)
zip_archive.writestr(args.file_name, file_contents.getvalue())
我正在尝试使用 Python 在内存中创建 ZipFile 对象,并将同样在内存中创建的单个文件写入 ZipFile 对象,然后将文件上传到 Google 云存储。
我的文件实际上并没有被压缩。知道我可能做错了什么吗?
我意识到可能有一种更奇特的方法可以将行数据放入文件对象中,但除此之外,我真的只是想弄清楚为什么生成的 zip 文件根本没有压缩出来。
更新:代码示例现在排除了与 Google 云服务(GCS 等)的任何交互,而只是将文件写入磁盘。
好像我先把文件写入磁盘,然后创建ZipFile,结果按预期压缩了,但是当我直接从内存中将StringIO内容添加到ZipFile对象时,内容并没有被压缩。
import random, io, argparse, os, string
from zipfile import ZipFile, ZipInfo, ZIP_DEFLATED
parser = argparse.ArgumentParser()
parser.add_argument("--row_limit", default=1000)
parser.add_argument("--file_name", default='file.txt', type=str)
parser.add_argument("--archive_name", default='file.zip', type=str)
parser.add_argument("--snapshot_millis", default=0, type=int)
args = parser.parse_args()
# imagine this has lots and lots of data in it, coming from a database query result
rows = [{
'seq_no': ''.join(random.choices(string.ascii_uppercase + string.digits, k=args.row_limit)),
'csv': ''.join(random.choices(string.ascii_uppercase + string.digits, k=args.row_limit))
}] * args.row_limit
archive = io.BytesIO()
# create zip archive in memory
with ZipFile(archive, 'w', compression=ZIP_DEFLATED, compresslevel=9) as zip_archive:
count = 0
file_contents = io.StringIO()
for row in rows:
if count > args.row_limit:
break
count += 1
file_contents.write(f"{row['seq_no']},{row['csv']}\n")
# write file to zip archive in memory
zip_file = ZipInfo(args.file_name)
zip_archive.writestr(zip_file, file_contents.getvalue())
# also write file to disk
with open(args.file_name, mode='w') as f:
print(file_contents.getvalue(), file=f)
print(f"StringIO Size: {file_contents.tell()}")
print(f"Text File Size On Disk: {os.path.getsize(args.file_name)}")
archive.seek(0)
with open(args.archive_name, 'wb') as outfile:
outfile.write(archive.getbuffer())
print(f"Zip File Created from File In Memory: {os.path.getsize(args.archive_name)}")
ZipFile(args.archive_name, mode='w', compression=ZIP_DEFLATED, compresslevel=9).write(args.file_name)
print(f"Zip File Created from File On Disk: {os.path.getsize(args.archive_name)}")
问题出在这里:
zip_file = ZipInfo(args.file_name)
zip_archive.writestr(zip_file, file_contents.getvalue())
When passing a ZipInfo instance as the zinfo_or_arcname parameter, the compression method used will be that specified in the compress_type member of the given ZipInfo instance. By default, the ZipInfo constructor sets this member to ZIP_STORED [i.e. uncompressed].
更正此问题的最简单方法是不使用完整的 ZipInfo
,而只使用文件名。这也将当前 date/time 设置为存档内文件的创建时间(ZipInfo
默认为 1980 年):
# zip_file = ZipInfo(args.file_name)
zip_archive.writestr(args.file_name, file_contents.getvalue())