PyPDF2 从第二页追加 PDF
PyPDF2 append a PDF from the 2nd page
我正在学习如何使用 "automate the boring stuff"-book 进行编程,但是我在第 13 章中遇到了障碍。
"Merge multiple PDF's, but omit the title page from all but the first page"
在书中,他们通过遍历 PDF 来实现,但是,在查看 PyPDF2 模块时,我发现 'pages'-选项是一个更简洁的解决方案。然而,我很难让它发挥作用。
先别看它是不是pythonic之类的。我还没学会 类 ;-) 读完这本书,我打算从 类、对象、装饰器、*args 和 **kwargs 开始 ;-)
我的代码片段的最后一行需要帮助。
我的代码:
for fn_PdfObjects in range(len(l_fn_PdfObjects)):
if fn_PdfObjects != 0:
break
else:
## watermark the first sheet
addWatermark(l_fn_PdfObjects[fn_PdfObjects])
watermarkedPage = PyPDF2.PdfFileReader(open('watermarkedCover.pdf', 'rb'))
# the 'position = ' is the page in the destination PDF it will receive
tempMergerFile.merge(position=fn_PdfObjects, fileobj=watermarkedPage)
tempMergerFile.merge(position=fn_PdfObjects+1, fileobj=l_fn_PdfObjects[fn_PdfObjects],pages='0:')
查看模块时,我发现:
来源:https://pythonhosted.org/PyPDF2/PdfFileMerger.html
merge(position, fileobj, bookmark=None, pages=None, import_bookmarks=True)
pages – can be a Page Range or a (start, stop[, step]) tuple to merge only the specified range of pages from the source document into the output document.
我也发现了关于 page_ranges 的内容,但无论我尝试什么,我都无法让它工作:
来源:https://github.com/mstamy2/PyPDF2/blob/master/PyPDF2/pagerange.py
class PageRange(object):
"""
A slice-like representation of a range of page indices,
i.e. page numbers, only starting at zero.
The syntax is like what you would put between brackets [ ].
The slice is one of the few Python types that can't be subclassed,
but this class converts to and from slices, and allows similar use.
o PageRange(str) parses a string representing a page range.
o PageRange(slice) directly "imports" a slice.
o to_slice() gives the equivalent slice.
o str() and repr() allow printing.
o indices(n) is like slice.indices(n).
"""
def __init__(self, arg):
"""
Initialize with either a slice -- giving the equivalent page range,
or a PageRange object -- making a copy,
or a string like
"int", "[int]:[int]" or "[int]:[int]:[int]",
where the brackets indicate optional ints.
{page_range_help}
Note the difference between this notation and arguments to slice():
slice(3) means the first three pages;
PageRange("3") means the range of only the fourth page.
However PageRange(slice(3)) means the first three pages.
"""
收到的错误如下:
TypeError: "pages" must be a tuple of (start, stop[, step])
Traceback (most recent call last):
File "combining_select_pages_from_many_pdfs.py", line 112, in <module>
main()
File "combining_select_pages_from_many_pdfs.py", line 104, in main
newPdfFile = mergePdfFiles(l_PdfObjects)
File "combining_select_pages_from_many_pdfs.py", line 63, in mergePdfFiles
tempMergerFile.merge(position=fn_PdfObjects+1, fileobj=l_fn_PdfObjects[fn_PdfObjects],pages=[0])
File "/home/sybie/.local/lib/python3.5/site-packages/PyPDF2/merger.py", line 143, in merge
raise TypeError('"pages" must be a tuple of (start, stop[, step])')
我能找到的是:
# Find the range of pages to merge.
if pages == None:
pages = (0, pdfr.getNumPages())
elif isinstance(pages, PageRange):
pages = pages.indices(pdfr.getNumPages())
elif not isinstance(pages, tuple):
raise TypeError('"pages" must be a tuple of (start, stop[, step])')
来源:https://github.com/mstamy2/PyPDF2/blob/master/PyPDF2/merger.py#L137
在此先感谢您的帮助!
我通过这样做解决了这个问题:
pages=(1,l_fn_PdfObjects[fn_PdfObjects].numPages)
其实我把它做成了一个元组。
如果有人仍然可以告诉我页面范围是如何工作的,我将不胜感激!
看来要用parse_filename_page_ranges功能了。它大致看起来像这样:
from PyPDF2 import PdfFileMerger, parse_filename_page_ranges
args=[records_pdf,'0:1',inv_pdf,records_pdf,'1:']
filename_page_ranges = parse_filename_page_ranges(args.fn_pgrgs)
output = open(destinationfile, "wb")
merger = PdfFileMerger()
in_fs = dict()
try:
for (filename, page_range) in filename_page_ranges:
if filename not in in_fs:
in_fs[filename] = open(filename, "rb")
merger.append(in_fs[filename], pages=page_range)
except:
print(traceback.format_exc(), file=stderr)
print("Error while reading " + filename, file=stderr)
exit(1)
merger.write(output)
我正在学习如何使用 "automate the boring stuff"-book 进行编程,但是我在第 13 章中遇到了障碍。 "Merge multiple PDF's, but omit the title page from all but the first page"
在书中,他们通过遍历 PDF 来实现,但是,在查看 PyPDF2 模块时,我发现 'pages'-选项是一个更简洁的解决方案。然而,我很难让它发挥作用。
先别看它是不是pythonic之类的。我还没学会 类 ;-) 读完这本书,我打算从 类、对象、装饰器、*args 和 **kwargs 开始 ;-)
我的代码片段的最后一行需要帮助。
我的代码:
for fn_PdfObjects in range(len(l_fn_PdfObjects)):
if fn_PdfObjects != 0:
break
else:
## watermark the first sheet
addWatermark(l_fn_PdfObjects[fn_PdfObjects])
watermarkedPage = PyPDF2.PdfFileReader(open('watermarkedCover.pdf', 'rb'))
# the 'position = ' is the page in the destination PDF it will receive
tempMergerFile.merge(position=fn_PdfObjects, fileobj=watermarkedPage)
tempMergerFile.merge(position=fn_PdfObjects+1, fileobj=l_fn_PdfObjects[fn_PdfObjects],pages='0:')
查看模块时,我发现: 来源:https://pythonhosted.org/PyPDF2/PdfFileMerger.html
merge(position, fileobj, bookmark=None, pages=None, import_bookmarks=True)
pages – can be a Page Range or a (start, stop[, step]) tuple to merge only the specified range of pages from the source document into the output document.
我也发现了关于 page_ranges 的内容,但无论我尝试什么,我都无法让它工作: 来源:https://github.com/mstamy2/PyPDF2/blob/master/PyPDF2/pagerange.py
class PageRange(object):
"""
A slice-like representation of a range of page indices,
i.e. page numbers, only starting at zero.
The syntax is like what you would put between brackets [ ].
The slice is one of the few Python types that can't be subclassed,
but this class converts to and from slices, and allows similar use.
o PageRange(str) parses a string representing a page range.
o PageRange(slice) directly "imports" a slice.
o to_slice() gives the equivalent slice.
o str() and repr() allow printing.
o indices(n) is like slice.indices(n).
"""
def __init__(self, arg):
"""
Initialize with either a slice -- giving the equivalent page range,
or a PageRange object -- making a copy,
or a string like
"int", "[int]:[int]" or "[int]:[int]:[int]",
where the brackets indicate optional ints.
{page_range_help}
Note the difference between this notation and arguments to slice():
slice(3) means the first three pages;
PageRange("3") means the range of only the fourth page.
However PageRange(slice(3)) means the first three pages.
"""
收到的错误如下:
TypeError: "pages" must be a tuple of (start, stop[, step])
Traceback (most recent call last):
File "combining_select_pages_from_many_pdfs.py", line 112, in <module>
main()
File "combining_select_pages_from_many_pdfs.py", line 104, in main
newPdfFile = mergePdfFiles(l_PdfObjects)
File "combining_select_pages_from_many_pdfs.py", line 63, in mergePdfFiles
tempMergerFile.merge(position=fn_PdfObjects+1, fileobj=l_fn_PdfObjects[fn_PdfObjects],pages=[0])
File "/home/sybie/.local/lib/python3.5/site-packages/PyPDF2/merger.py", line 143, in merge
raise TypeError('"pages" must be a tuple of (start, stop[, step])')
我能找到的是:
# Find the range of pages to merge.
if pages == None:
pages = (0, pdfr.getNumPages())
elif isinstance(pages, PageRange):
pages = pages.indices(pdfr.getNumPages())
elif not isinstance(pages, tuple):
raise TypeError('"pages" must be a tuple of (start, stop[, step])')
来源:https://github.com/mstamy2/PyPDF2/blob/master/PyPDF2/merger.py#L137
在此先感谢您的帮助!
我通过这样做解决了这个问题:
pages=(1,l_fn_PdfObjects[fn_PdfObjects].numPages)
其实我把它做成了一个元组。 如果有人仍然可以告诉我页面范围是如何工作的,我将不胜感激!
看来要用parse_filename_page_ranges功能了。它大致看起来像这样:
from PyPDF2 import PdfFileMerger, parse_filename_page_ranges
args=[records_pdf,'0:1',inv_pdf,records_pdf,'1:']
filename_page_ranges = parse_filename_page_ranges(args.fn_pgrgs)
output = open(destinationfile, "wb")
merger = PdfFileMerger()
in_fs = dict()
try:
for (filename, page_range) in filename_page_ranges:
if filename not in in_fs:
in_fs[filename] = open(filename, "rb")
merger.append(in_fs[filename], pages=page_range)
except:
print(traceback.format_exc(), file=stderr)
print("Error while reading " + filename, file=stderr)
exit(1)
merger.write(output)