Pdfbox

Question

我正在使用 Pdfbox (1.8.8) 向 pdf 添加附件。我的问题是，当其中一个附件的类型为 .pdf 并且我将 PDDocument 保存到 OutputStream 时，最终的 pdf 文档不包含附件。如果将 PDDocument 保存到文件而不是 OutputStream 都可以正常工作，并且如果附件不包含任何 pdf，则保存到文件或 OutputStream 都可以正常工作。

我想知道是否有任何方法可以添加 pdf 嵌入文件并将 PDDocument 保存到 OutputStream，将附件保存在生成的最终 pdf 中。

我使用的代码是：

 private void insertAttachments(OutputStream out, ArrayList<Attachment> attachmentsResources) {

            final PDDocument doc;
            Boolean hasPdfAttach = false;
            try {
                doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));
                // final PDFTextStripper pdfStripper = new PDFTextStripper();
                // final String text = pdfStripper.getText(doc);
                final PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
                final Map embeddedFileMap = new HashMap();
                PDEmbeddedFile embeddedFile;
                File file = null;

                for (Attachment attach : attachmentsResources) {

                    // first create the file specification, which holds the embedded file
                    final PDComplexFileSpecification fileSpecification = new PDComplexFileSpecification();
                    fileSpecification.setFile(attach.getFilename());
                    file = AttachmentUtils.getAttachmentFile(attach);
                    final InputStream is = new FileInputStream(file.getAbsolutePath());

                    embeddedFile = new PDEmbeddedFile(doc, is);
                    // set some of the attributes of the embedded file
                    if ("application/pdf".equals(attach.getMimetype())) {
                        hasPdfAttach = true;
                    }
                    embeddedFile.setSubtype(attach.getMimetype());
                    embeddedFile.setSize((int) (long) attach.getFilesize());
                    fileSpecification.setEmbeddedFile(embeddedFile);

                    // now add the entry to the embedded file tree and set in the document.
                    embeddedFileMap.put(attach.getFilename(), fileSpecification);
                    // final String text2 = pdfStripper.getText(doc);
                }
                // final String text3 = pdfStripper.getText(doc);
                efTree.setNames(embeddedFileMap);
                // ((COSDictionary) efTree.getCOSObject()).removeItem(COSName.LIMITS); (this not work for me)
                // attachments are stored as part of the "names" dictionary in the document catalog
                final PDDocumentNameDictionary names = new PDDocumentNameDictionary(doc.getDocumentCatalog());
                names.setEmbeddedFiles(efTree);
                doc.getDocumentCatalog().setNames(names);
                // final ByteArrayOutputStream pdfboxToDocumentStream = new ByteArrayOutputStream();
                final String tmpfile = "temporary.pdf";
                if (hasPdfAttach) {
                    final File f = new File(tmpfile);
                    doc.save(f);
                    doc.close();
                     //i have try with parser but without success too
                    // PDFParser parser = new PDFParser(new FileInputStream(tmpfile));
                    // parser.parse();
                    // PDDocument doc2 = parser.getPDDocument();
                    final PDDocument doc2 = PDDocument.loadNonSeq(f, new RandomAccessFile(new File(getHomeTMP()
                            + "tempppp.pdf"), "r"));
                    doc2.save(out);
                    doc2.close();
                } else {
                    doc.save(out);
                    doc.close();
                }
                 //that does not work too
                // final InputStream in = new FileInputStream(tmpfile);
                // IOUtils.copy(in, out);
                // out = new FileOutputStream(tmpFile);
                // doc.save (out);

            } catch (IOException e1) {
                e1.printStackTrace();
            } catch (Exception e2) {
                e2.printStackTrace();
            }
        }

此致

解决方案：

private void insertAttachments(OutputStream out, ArrayList<Attachment> attachmentsResources) {

    final PDDocument doc;
    try {
        doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));
        ((ByteArrayOutputStream) out).reset();
        final PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
        final Map embeddedFileMap = new HashMap();
        PDEmbeddedFile embeddedFile;
        File file = null;

        for (Attachment attach : attachmentsResources) {

            // first create the file specification, which holds the embedded file
            final PDComplexFileSpecification fileSpecification = new PDComplexFileSpecification();
            fileSpecification.setFile(attach.getFilename());
            file = AttachmentUtils.getAttachmentFile(attach);
            final InputStream is = new FileInputStream(file.getAbsolutePath());

            embeddedFile = new PDEmbeddedFile(doc, is);
            // set some of the attributes of the embedded file
            embeddedFile.setSubtype(attach.getMimetype());
            embeddedFile.setSize((int) (long) attach.getFilesize());
            fileSpecification.setEmbeddedFile(embeddedFile);

            // now add the entry to the embedded file tree and set in the document.
            embeddedFileMap.put(attach.getFilename(), fileSpecification);

        }
        efTree.setNames(embeddedFileMap);
        ((COSDictionary) efTree.getCOSObject()).removeItem(COSName.LIMITS);
        // attachments are stored as part of the "names" dictionary in the document catalog
        final PDDocumentNameDictionary names = new PDDocumentNameDictionary(doc.getDocumentCatalog());
        names.setEmbeddedFiles(efTree);
        doc.getDocumentCatalog().setNames(names);
        ((COSDictionary) efTree.getCOSObject()).removeItem(COSName.LIMITS);
        doc.save(out);
        doc.close();

    } catch (IOException e1) {
        e1.printStackTrace();
    } catch (Exception e2) {
        e2.printStackTrace();
    }
}

Answer 1

您将新 PDF 存储在 out 中的原始 PDF 之后：

查看您的方法中 out 的所有用法：

private void insertAttachments(OutputStream out, ArrayList<Attachment> attachmentsResources) {
    ...
            doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));
    ...
                doc2.save(out);
    ...
                doc.save(out);

所以你得到一个 ByteArrayOutputStream 作为输入，并将其当前内容作为输入（即 ByteArrayOutputStream 不是空的，但已经包含一个 PDF），经过一些处理后，你将修改后的 PDF 附加到ByteArrayOutputStream。根据您向其展示的 PDF 查看器，您将看到原始 PDF 或经过处理的 PDF 或（非常正确的）文件是垃圾的错误消息。

如果您希望 ByteArrayOutputStream 仅包含经过处理的 PDF，只需添加

((ByteArrayOutputStream) out).reset();

或（如果您不确定流的状态）

out = new ByteArrayOutputStream();

紧接着

doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));

PS: 根据评论，OP 尝试对他的代码进行上述更改，但没有成功。

我不能运行问题中的代码，因为它不是self-contained。因此，我将其简化为获得 self-contained 测试的必需品：

@Test
public void test() throws IOException, COSVisitorException
{
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    try (
            InputStream sourceStream = getClass().getResourceAsStream("test.pdf");
            InputStream attachStream = getClass().getResourceAsStream("artificial text.pdf"))
    {
        final PDDocument document = PDDocument.load(sourceStream);

        final PDEmbeddedFile embeddedFile = new PDEmbeddedFile(document, attachStream);
        embeddedFile.setSubtype("application/pdf");
        embeddedFile.setSize(10993);

        final PDComplexFileSpecification fileSpecification = new PDComplexFileSpecification();
        fileSpecification.setFile("artificial text.pdf");
        fileSpecification.setEmbeddedFile(embeddedFile);

        final Map<String, PDComplexFileSpecification> embeddedFileMap = new HashMap<String, PDComplexFileSpecification>();
        embeddedFileMap.put("artificial text.pdf", fileSpecification);

        final PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
        efTree.setNames(embeddedFileMap);

        final PDDocumentNameDictionary names = new PDDocumentNameDictionary(document.getDocumentCatalog());
        names.setEmbeddedFiles(efTree);
        document.getDocumentCatalog().setNames(names);

        document.save(baos);
        document.close();
    }
    Files.write(Paths.get("attachment.pdf"), baos.toByteArray());
}

如您所见，此处的 PDFBox 仅使用流。结果：

因此，PDFBox 可以毫无问题地存储嵌入了 PDF 文件附件的 PDF。

因此，该问题很可能与此工作流程本身无关

Pdfbox - 添加 pdf 嵌入文件并将 PDDocument 保存到 OutputStream 不保留嵌入文件

Pdfbox - adding pdf embedded File and save the PDDocument to OutputStream does not keep the embedded Files

pdf

attachment