如何在 Python 3 中合并两个 tar gz BinaryIO
How to merge two tar gz BinaryIO in Python 3
我有两个二进制 I/O 流(都继承自 BufferedIOBase),它们代表两个 tar 使用 gzip 算法压缩的档案。
有没有什么有效的方法可以创建第三个,它是另外两个的组合?
我尝试通过 fileobj
参数将两个流都转换为 tarfile.Tarfile
并将每个成员添加到第三个:
def merge_environment(a: Optional[BinaryIO], b: Optional[BinaryIO]) -> Optional[BinaryIO]:
"""Merge <a> and <b>, returning a new tarfile.TarFile object.
If two files in <a> and <b> have the same name, the one in <a> prevails."""
destio = io.BytesIO()
with tarfile.open(fileobj=a, mode="r:gz") as t1, \
tarfile.open(fileobj=b, mode="r:gz") as t2, \
tarfile.open(fileobj=destio, mode="w:gz") as dest:
t1_members = [m for m in t1.getmembers() if m.name != ""]
t1_names = [m.name for m in t1.members]
t2_members = [m for m in t1.getmembers() if m.name != "" and m.name not in t1_names]
for member in t1_members:
dest.addfile(member, t1.extractfile(member))
for member in t2_members:
dest.addfile(member, t2.extractfile(member))
destio.seek(0, 0)
return destio
但是,如果任何 tar.gz 包含目录,这将不起作用,因为 tarfile.extractfile()
不适用于目录。
有什么解决办法吗?
解决方案很简单,tarfile.addfile
的 fileobj
参数不是强制性的,tarinfo
就足够了:
destio = io.bytesIO()
with tarfile.open(fileobj=a, mode="r:gz") as t1, \
tarfile.open(fileobj=b, mode="r:gz") as t2, \
tarfile.open(fileobj=destio, mode="w:gz") as dest:
t1_members = [m for m in t1.getmembers()]
t1_names = t1.getnames()
t2_members = [m for m in t2.getmembers() if m.name not in t1_names]
for member in t1_members:
if member.isdir():
dest.addfile(member)
else:
dest.addfile(member, t1.extractfile(member))
for member in t2_members:
if member.isdir():
dest.addfile(member)
else:
dest.addfile(member, t2.extractfile(member))
我有两个二进制 I/O 流(都继承自 BufferedIOBase),它们代表两个 tar 使用 gzip 算法压缩的档案。
有没有什么有效的方法可以创建第三个,它是另外两个的组合?
我尝试通过 fileobj
参数将两个流都转换为 tarfile.Tarfile
并将每个成员添加到第三个:
def merge_environment(a: Optional[BinaryIO], b: Optional[BinaryIO]) -> Optional[BinaryIO]:
"""Merge <a> and <b>, returning a new tarfile.TarFile object.
If two files in <a> and <b> have the same name, the one in <a> prevails."""
destio = io.BytesIO()
with tarfile.open(fileobj=a, mode="r:gz") as t1, \
tarfile.open(fileobj=b, mode="r:gz") as t2, \
tarfile.open(fileobj=destio, mode="w:gz") as dest:
t1_members = [m for m in t1.getmembers() if m.name != ""]
t1_names = [m.name for m in t1.members]
t2_members = [m for m in t1.getmembers() if m.name != "" and m.name not in t1_names]
for member in t1_members:
dest.addfile(member, t1.extractfile(member))
for member in t2_members:
dest.addfile(member, t2.extractfile(member))
destio.seek(0, 0)
return destio
但是,如果任何 tar.gz 包含目录,这将不起作用,因为 tarfile.extractfile()
不适用于目录。
有什么解决办法吗?
解决方案很简单,tarfile.addfile
的 fileobj
参数不是强制性的,tarinfo
就足够了:
destio = io.bytesIO()
with tarfile.open(fileobj=a, mode="r:gz") as t1, \
tarfile.open(fileobj=b, mode="r:gz") as t2, \
tarfile.open(fileobj=destio, mode="w:gz") as dest:
t1_members = [m for m in t1.getmembers()]
t1_names = t1.getnames()
t2_members = [m for m in t2.getmembers() if m.name not in t1_names]
for member in t1_members:
if member.isdir():
dest.addfile(member)
else:
dest.addfile(member, t1.extractfile(member))
for member in t2_members:
if member.isdir():
dest.addfile(member)
else:
dest.addfile(member, t2.extractfile(member))