PDFBox 在指定页码处拆分为 3（插入 pdf）

Question

我读过帖子 and How to merge two PDF files into one in Java? 但是，它只演示了如何在每一页上将其拆分或分成相等的夹头，addSource() 的合并 api 似乎只有 File 、String 和 InputStream 和 not PDDocument。

我想将一页 pdf 文件插入一个更大的 pdf 文件（比如 100 页）的指定页码的 3 个位置，例如第 3、7 和 10 页。因此，我需要在第 3、7、10 页拆分较大的文档，然后插入一页 pdf 文档，然后将所有拆分部分合并到一个新的 pdf 文件中。

我尝试过如下操作：

        PDDocument doc;
        PDDocument onePage;
        Splitter splitDoc = new Splitter();
        PDFMergerUtility mergedDoc = new PDFMergerUtility();

        onePage = PDDocument.load("/path/onepage.pdf");
        doc = PDDocument.load("/path/hundredpages.pdf");
        splitDoc.setSplitAtPage(1); // inefficient
        // is there a better solution for split?
        List<PDDocument> splitDocs = splitDoc.split(doc);

        for (int i=0; i<splitDocs.size(); i++) {

            if (i==2 || i==7 || i==10) { // only to demonstrate

                mergeFiles.addSource(onePage); // see comment below

            } else {

                // doesn't accept PDDocument 
                // what's the alternative without resorting to InputStream
                mergeFiles.addSource(splitDocs.remove(0)); 

            }


        }

        mergedDoc.setDestinationFileName("/path/mergeddoc.pdf");
        mergedDoc.mergeDocuments();

我哪里出错了或者有更好的方法吗？

Answer 1

这个答案是关于你实际想要实现的，即

I would like to insert a one page pdf file into 3 places of a larger pdf file (say 100 pages) at specified pages numbers, e.g. pages 3, 7 and 10.

而不是您认为您必须为此做的事情，即

So, I need to split the larger document at page 3, 7, 10, then insert the one page pdf doc, and then merge all the splits parts together in a new pdf file.

此外，我假设您仍在使用 PDFBox 版本 1。8.x，而不是 2.0.0 候选版本。

要将页面插入文档（由 PDDocument 实例表示），您实际上不必拆分 re-merge 该文档，您只需在给定索引处添加页面.因此，我们可以简化方法。

但与此同时，您的任务中有一个细节再次使它复杂化：您不能将相同的页面 object 多次插入同一个目标文档中，至少必须创建一个它的浅拷贝。

考虑到这一点，您可以将一页 pdf 文件插入到较大 pdf 的 3 个位置:

PDDocument document = ...;
PDDocument singlePageDocument = ...;
PDPage singlePage = (PDPage) singlePageDocument.getDocumentCatalog().getAllPages().get(0);

PDPageNode rootPages = document.getDocumentCatalog().getPages();
rootPages.getKids().add(3-1, singlePage);
singlePage.setParent(rootPages);
singlePage = new PDPage(new COSDictionary(singlePage.getCOSDictionary()));
rootPages.getKids().add(7-1, singlePage);
singlePage = new PDPage(new COSDictionary(singlePage.getCOSDictionary()));
rootPages.getKids().add(10-1, singlePage);
rootPages.updateCount();

document.save(...);

(InsertPages.java方法testInsertPages)

当心， 但是，此代码假定一个平面页面树。对于更深的页面树，必须以不同的方式遍历页面列表：要将页面作为第 n 个文档页面插入，您不能简单地将它的位置 n-1 添加到 Pages 根，而是必须一个一个地检查它的孩子，如果出现一个内部 PDPageNode object，你必须读取它的 Count 值来检查它包含的页数；如果这个数字意味着要插入的位置包含在其中，则必须递归到该内部 PDPageNode object.

PDFBox 在指定页码处拆分为 3（插入 pdf）

PDFBox Split into 3 at specified page numbers (to insert pdf)

java

pdfbox