OpenPDF/iText 损坏的文件

Question

我一直在尝试 re-implement 在 Scala 中连接来自 OpenPDF 1.2.4 和 1.2.11 的示例：

def mergePdfs(docs: Seq[Array[Byte]]): Array[Byte] = {
    log.debug(s"merging ${docs.size} PDFs")
    val output = new ByteArrayOutputStream()
    val document = new Document()
    val copy = new PdfCopy(document, output)
    getPageSize(docs.headOption) foreach document.setPageSize
    document.open()
    docs foreach { doc =>
      val reader = new PdfReader(doc)
      1 to reader.getNumberOfPages foreach { pageNum =>
        copy.addPage(copy.getImportedPage(reader, pageNum))
      }
    }
    document.close()
    output.toByteArray
  }

~~Here~~ Here is an example output document. I generated it from two copies of this and then three copies of this.

我发现了两个问题：

~~- 文档已损坏（只能在 FireFox 中打开），部分原因是 header 和第一个 object 之间紧挨着一行内容。删除有问题的行并不能修复客户端代码中的 document~~ 错误，感谢@mkl!

有些页面（通常是一页，但 non-deterministic）显示为空白。没有我见过的模式。此外，每个页面的文本在文件中出现两次。例如在上面的例子中：

$ strings out.pdf | grep "A Simple PDF File" | wc -l | tr -d ' '
6

在一个案例中，我使用 vim 删除了第一个内容流，这导致文本出现在第一页上。

我是否在某种程度上滥用了 API？

Answer 1

结果文件的前 17465 个字节是代码的实际结果 ("two copies of this and then three copies of this")。 31181 字节文件的剩余字节由其他 PDF 的片段组成。

在评论中您说您 "calling Files.write with the resulting byte array." 您正在使用哪个 OpenOption？可能 CREATE 但不是 TRUNCATE_EXISTING?

OpenPDF/iText 损坏的文件

OpenPDF/iText corrupt documents

scala

itext

openpdf