在 Python 中加入大文件
joining big files in Python
我有几个要合并的 HEVEC 文件。对于小文件(大约 1.5 GB),以下代码可以正常工作
with open(path+"/"+str(sys.argv[2])+"_EL.265", "wb") as outfile:
for fname in dirs:
with open(path+"/"+fname, 'rb') as infile:
outfile.write(infile.read())
对于更大的文件(8 GB 或更多),相同的代码会卡住。
我从这里 (Lazy Method for Reading Big File in Python?) 复制了用于分块读取大文件的代码,并将其与我的代码集成:
def read_in_chunks(file_object, chunk_size=1024):
"""Lazy function (generator) to read a file piece by piece.
Default chunk size: 1k."""
while True:
data = file_object.read(chunk_size)
if not data:
break
yield data
with open(path + "/" + str(sys.argv[2]) + "_BL.265", "wb") as outfile_bl:
for fname in dirs:
with open(path+"/"+fname, 'rb') as infile:
for piece in read_in_chunks(infile):
outfile_bl.write(infile.read())
此代码生成一个大小合适的文件,但它不再是 HEVC 文件,无法被视频播放器读取。
有什么想法吗?
请帮助
达里奥
您在两个不同的地方从 infile
读取:在 read_in_chunks
内部,以及在调用 outfile_bl
时直接读取。这会导致您跳过将刚刚读取的数据写入变量 piece
,因此您只复制了大约一半的文件。
您已经将数据读入piece
;只需将其写入您的文件即可。
with open(path + "/" + str(sys.argv[2]) + "_BL.265", "wb") as outfile_bl:
for fname in dirs:
with open(path+"/"+fname, 'rb') as infile:
for piece in read_in_chunks(infile):
outfile_bl.write(piece)
顺便说一句,你真的不需要定义 read_in_chunks
,或者至少它的定义可以通过使用 iter
:
大大简化
def read_in_chunks(file_object, chunk_size=1024):
"""Lazy function (generator) to read a file piece by piece.
Default chunk size: 1k."""
yield from iter(lambda: file_object.read(chunk_size), '')
# Or
# from functools import partial
# yield from iter(partial(file_object.read, chunk_size), '')
我有几个要合并的 HEVEC 文件。对于小文件(大约 1.5 GB),以下代码可以正常工作
with open(path+"/"+str(sys.argv[2])+"_EL.265", "wb") as outfile:
for fname in dirs:
with open(path+"/"+fname, 'rb') as infile:
outfile.write(infile.read())
对于更大的文件(8 GB 或更多),相同的代码会卡住。 我从这里 (Lazy Method for Reading Big File in Python?) 复制了用于分块读取大文件的代码,并将其与我的代码集成:
def read_in_chunks(file_object, chunk_size=1024):
"""Lazy function (generator) to read a file piece by piece.
Default chunk size: 1k."""
while True:
data = file_object.read(chunk_size)
if not data:
break
yield data
with open(path + "/" + str(sys.argv[2]) + "_BL.265", "wb") as outfile_bl:
for fname in dirs:
with open(path+"/"+fname, 'rb') as infile:
for piece in read_in_chunks(infile):
outfile_bl.write(infile.read())
此代码生成一个大小合适的文件,但它不再是 HEVC 文件,无法被视频播放器读取。
有什么想法吗? 请帮助
达里奥
您在两个不同的地方从 infile
读取:在 read_in_chunks
内部,以及在调用 outfile_bl
时直接读取。这会导致您跳过将刚刚读取的数据写入变量 piece
,因此您只复制了大约一半的文件。
您已经将数据读入piece
;只需将其写入您的文件即可。
with open(path + "/" + str(sys.argv[2]) + "_BL.265", "wb") as outfile_bl:
for fname in dirs:
with open(path+"/"+fname, 'rb') as infile:
for piece in read_in_chunks(infile):
outfile_bl.write(piece)
顺便说一句,你真的不需要定义 read_in_chunks
,或者至少它的定义可以通过使用 iter
:
def read_in_chunks(file_object, chunk_size=1024):
"""Lazy function (generator) to read a file piece by piece.
Default chunk size: 1k."""
yield from iter(lambda: file_object.read(chunk_size), '')
# Or
# from functools import partial
# yield from iter(partial(file_object.read, chunk_size), '')