Apache PDFBox 合并错误 - java.io.IOException：预告片中缺少根对象规范

Question

我正在尝试使用 PDFBox 中的 PDFMergerUtility.mergeDocuments() 方法将两个 InputStreams 的现有 PDF 文档合并在一起。这是我的代码；输入方法是pullDocumentsIntoSystem():

private boolean pullDocumentsIntoSystem(final String id, final String filePathAndName, final List<Letter> parsedLetters)
        throws IOException {

    final List<InputStream> pdfStreams = new ArrayList<InputStream>();
    final ByteArrayOutputStream mergedPdfOutputStream = new ByteArrayOutputStream();

            // make a call to retrieve each document
            for (final Letter letter : parsedLetters) {
                pdfStreams.add(this.getSpecificDocument(letter.getKey(), id));
            }

            // merge all the documents together
            this.mergePdfDocuments(pdfStreams, mergedPdfOutputStream);

            // write file to directory
            this.writeMergedPdfDocument(mergedPdfOutputStream, filePathAndName); //...more code below...

}

private InputStream getSpecificDocument(final String id, final String key) throws IOException {

    HttpURLConnection conn = null;
    InputStream pdfStream = null;

    try {
        final String url = this.getBaseURL() + "/letter/" + id + "/documents/" + key;

        conn = (HttpURLConnection) new URL(url).openConnection();
        conn.setRequestMethod("GET");
        conn.setRequestProperty("X-Letter-Authentication", this.getAuthenticationHeader());
        conn.setRequestProperty("Accept", "application/pdf");
        conn.setRequestProperty("Content-Type", "application/pdf");
        conn.setDoOutput(true);          

        pdfStream = connection.getInputStream();

    }
    finally {
        this.disconnect(connection);
    }

    return pdfStream;
}

    private void mergePdfDocuments(final List<InputStream> pdfStreams, final ByteArrayOutputStream mergedPdfOutputStream)
        throws IOException {

    final PDFMergerUtility merger = new PDFMergerUtility();

    merger.addSources(pdfStreams);

    merger.setDestinationStream(mergedPdfOutputStream);
    merger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());  // ERROR THROWN HERE
}

这是我在带有上述评论的行中收到的错误：

Caused by: java.io.IOException: Missing root object specification in trailer.   
at org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2832) ~[pdfbox-2.0.11.jar:2.0.11]     
at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:173) ~[pdfbox-2.0.11.jar:2.0.11]   
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:220) ~[pdfbox-2.0.11.jar:2.0.11]  
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1144) ~[pdfbox-2.0.11.jar:2.0.11]  
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1060) ~[pdfbox-2.0.11.jar:2.0.11]  
at org.apache.pdfbox.multipdf.PDFMergerUtility.legacyMergeDocuments(PDFMergerUtility.java:379) ~[pdfbox-2.0.11.jar:2.0.11]  
at org.apache.pdfbox.multipdf.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:280) ~[pdfbox-2.0.11.jar:2.0.11]

我正在使用 PDFBox 2.0.11。

我的 InputStream 列表每个都来自单独的 HttpURLConnection.getInputStream() 调用，以防万一。我已经确认 HttpURLConnection.

中的调用确实返回了文档

更新

根据下面@Tilman Hausherr 的建议，我在不使用 InputStream 的情况下测试了相同的功能。如果我使用 PDFMergerUtility.addSource(File source) 方法而不是 PDFMergerUtility.addSource(List<InputStream>) 合并成功。所以好像我的 InputStreams 有什么东西不能正常工作。

感谢任何帮助，如果需要，我很乐意提供更多信息。

感谢您的宝贵时间！

Answer 1

输入流中可能存在问题尝试添加 application/pdf mime 类型。

Answer 2

最后这真是一个愚蠢的错误。我关闭 HttpURLConnection 太早了。如果我在 getSpecificDocument() 方法末尾删除 this.disconnect(connection) 调用，那么一切正常。

嗯，希望这对其他人有帮助。

感谢@Фарид Азаев 和@Tilman Hausherr 的指导！

Apache PDFBox 合并错误 - java.io.IOException：预告片中缺少根对象规范

Apache PDFBox Merge Error - java.io.IOException: Missing root object specification in trailer

java

pdfbox

更新