使用 pypdf2 合并失败的 PDF 页面

Merging PDF pages failing with pypdf2

these demo files,

test.pdf: "Hello"
tomerge1.pdf: "1"
tomerge2.pdf: "2"

output.pdf,我想要:

这是我使用的:

from PyPDF2 import PdfFileWriter, PdfFileReader

outputpdf = PdfFileWriter()
inputpdf = PdfFileReader(open("test.pdf", "rb"))
tomerge1 = PdfFileReader(open("tomerge1.pdf", "rb"))
tomerge2 = PdfFileReader(open("tomerge2.pdf", "rb"))

page = inputpdf.getPage(0)
page.mergePage(tomerge1.getPage(0))
outputpdf.addPage(page)

# exit()
# if we stop here, the output is "Hello 1", which is good
# Why isn't "Hello 1" remembered here?
# del page    # doesn't change anything

page = inputpdf.getPage(0)
page.mergePage(tomerge2.getPage(0))
outputpdf.addPage(page)

with open("output.pdf", "wb") as f:
    outputpdf.write(f)

遗憾的是,它不起作用:输出不是 "Hello 1" / "Hello 2",而是: "Hello 2" / "Hello 2".

问题:如何实现预期的行为?(当有 10 或 20 页时,大小不会增长得很快)

我在做类似练习的时候发现,需要读一次,合并一次。解决方法是为两个阅读器合并的输入文件 ("test.pdf") 设置两个阅读器。下面的示例代码:

addressfile = open("Documents/addresses.pdf","rb")
xwfile = "Downloads/input.pdf"
crosswordfile = open(xwfile,"rb")
xword = PdfFileReader(crosswordfile)
xw2 = PdfFileReader(crosswordfile)
addr = PdfFileReader(addressfile)
xwpage = xword.getPage(0)
addpage1 = addr.getPage(1)
addpage2 = addr.getPage(2)
pdfWriter = PdfFileWriter()
xp2 = xw2.getPage(0)
xwpage.mergePage(addpage1)
xp2.mergePage(addpage2)
res = open("/home/paula/xw.pdf",'wb')
pdfWriter.addPage(xwpage)
pdfWriter.addPage(xp2)
pdfWriter.write(res)
res.close()
crosswordfile.close()

所以在你的代码中是这样的:

testfile = open("test.pdf", "rb")
outputpdf = PdfFileWriter()
inputpdf1 = PdfFileReader(testfile)
inputpdf2 = PdfFileReader(testfile)
tomerge1 = PdfFileReader(open("tomerge1.pdf", "rb"))
tomerge2 = PdfFileReader(open("tomerge2.pdf", "rb"))

page1 = inputpdf1.getPage(0)
page1.mergePage(tomerge1.getPage(0))
outputpdf.addPage(page1)

# exit()
# No need stop here, the output will have both "Hello 1" and "Hello 2"
# Using two readers for the same file fools PyPdf2 into thinking they 
# are two different files, i.e. that we are merging from two sperate sources

page2 = inputpdf2.getPage(0)
page2.mergePage(tomerge2.getPage(0))
outputpdf.addPage(page2)

with open("output.pdf", "wb") as f:
    outputpdf.write(f)