Convert/Write PDF 到 RAM 作为类文件对象,以便进一步使用它
Convert/Write PDF to RAM as file-like object for further working with it
我的脚本生成 PDF (PyPDF2.pdf.PdfFileWriter object
) 并将其存储在变量中。
我需要在脚本中进一步使用它作为 file-like object
。但现在我必须先把它写到硬盘上。然后我必须将它作为文件打开才能使用它。
为了防止这种不必要的 writing/reading 操作,我找到了很多解决方案 - StringIO
、BytesIO
等等。但是我找不到对我的情况有什么帮助。
据我了解 - 我需要 "convert"(或写入 RAM)PyPDF2.pdf.PdfFileWriter object
到 file-like object
才能直接使用它。
或者有另一种方法完全适合我的情况?
更新 - 这是代码示例
from pdfrw import PdfReader, PdfWriter, PageMerge
from PyPDF2 import PdfFileReader, PdfFileWriter
red_file = PdfFileReader(open("file_name.pdf", 'rb'))
large_pages_indexes = [1, 7, 9]
large = PdfFileWriter()
for i in large_pages_indexes:
p = red_file.getPage(i)
large.addPage(p)
# here final data have to be written (I would like to avoid that)
with open("virtual_file.pdf", 'wb') as tmp:
large.write(tmp)
# here I need to read exported "virtual_file.pdf" (I would like to avoid that too)
with open("virtual_file.pdf", 'rb') as tmp:
pdf = PdfReader(tmp) # here I'm starting to work with this file using another module "pdfrw"
print(pdf)
为了避免慢速磁盘 I/O 看来您要更换
with open("virtual_file.pdf", 'wb') as tmp:
large.write(tmp)
with open("virtual_file.pdf", 'rb') as tmp:
pdf = PdfReader(tmp)
与
buf = io.BytesIO()
large.write(buf)
buf.seek(0)
pdf = PdfReader(buf)
此外,buf.getvalue()
可供您使用。
我的脚本生成 PDF (PyPDF2.pdf.PdfFileWriter object
) 并将其存储在变量中。
我需要在脚本中进一步使用它作为 file-like object
。但现在我必须先把它写到硬盘上。然后我必须将它作为文件打开才能使用它。
为了防止这种不必要的 writing/reading 操作,我找到了很多解决方案 - StringIO
、BytesIO
等等。但是我找不到对我的情况有什么帮助。
据我了解 - 我需要 "convert"(或写入 RAM)PyPDF2.pdf.PdfFileWriter object
到 file-like object
才能直接使用它。
或者有另一种方法完全适合我的情况?
更新 - 这是代码示例
from pdfrw import PdfReader, PdfWriter, PageMerge
from PyPDF2 import PdfFileReader, PdfFileWriter
red_file = PdfFileReader(open("file_name.pdf", 'rb'))
large_pages_indexes = [1, 7, 9]
large = PdfFileWriter()
for i in large_pages_indexes:
p = red_file.getPage(i)
large.addPage(p)
# here final data have to be written (I would like to avoid that)
with open("virtual_file.pdf", 'wb') as tmp:
large.write(tmp)
# here I need to read exported "virtual_file.pdf" (I would like to avoid that too)
with open("virtual_file.pdf", 'rb') as tmp:
pdf = PdfReader(tmp) # here I'm starting to work with this file using another module "pdfrw"
print(pdf)
为了避免慢速磁盘 I/O 看来您要更换
with open("virtual_file.pdf", 'wb') as tmp:
large.write(tmp)
with open("virtual_file.pdf", 'rb') as tmp:
pdf = PdfReader(tmp)
与
buf = io.BytesIO()
large.write(buf)
buf.seek(0)
pdf = PdfReader(buf)
此外,buf.getvalue()
可供您使用。