如何使用 iTextSharp 组合多个 PDF 文件(不包括分页符)?

How can I combine multiple PDF files excluding page breaks using iTextSharp?

我想知道是否有人用 iTextSharp 做过这个,但我想将多个 PDF 文件合并为一个文件,但让页面断开。例如,我想创建 4 个 PDF 文件,每个文件包含 3 行文本,因此我希望生成的文件在 1 页中包含所有 12 行。这可能吗?

因为 OP 也用 [iText] 标记了这个问题,而且我更喜欢 Java 而不是 .Net,这里是 iText/Java 的答案。翻译成iTextSharp/C#.

应该很容易

原题

I would like to combine multiple PDF files into one but leave the page breaks out. For example, I would like to create 4 PDF files containing 3 lines of text each, so I want the resulting file to have all 12 lines in 1 page.

对于该示例中所示的 PDF 文件,您可以使用这个简单的实用程序 class:

public class PdfDenseMergeTool
{
    public PdfDenseMergeTool(Rectangle size, float top, float bottom, float gap)
    {
        this.pageSize = size;
        this.topMargin = top;
        this.bottomMargin = bottom;
        this.gap = gap;
    }

    public void merge(OutputStream outputStream, Iterable<PdfReader> inputs) throws DocumentException, IOException
    {
        try
        {
            openDocument(outputStream);
            for (PdfReader reader: inputs)
            {
                merge(reader);
            }
        }
        finally
        {
            closeDocument();
        }

    }

    void openDocument(OutputStream outputStream) throws DocumentException
    {
        final Document document = new Document(pageSize, 36, 36, topMargin, bottomMargin);
        final PdfWriter writer = PdfWriter.getInstance(document, outputStream);
        document.open();
        this.document = document;
        this.writer = writer;
        newPage();
    }

    void closeDocument()
    {
        try
        {
            document.close();
        }
        finally
        {
            this.document = null;
            this.writer = null;
            this.yPosition = 0;
        }
    }

    void newPage()
    {
        document.newPage();
        yPosition = pageSize.getTop(topMargin);
    }

    void merge(PdfReader reader) throws IOException
    {
        PdfReaderContentParser parser = new PdfReaderContentParser(reader);
        for (int page = 1; page <= reader.getNumberOfPages(); page++)
        {
            merge(reader, parser, page);
        }
    }

    void merge(PdfReader reader, PdfReaderContentParser parser, int page) throws IOException
    {
        TextMarginFinder finder = parser.processContent(page, new TextMarginFinder());
        Rectangle pageSizeToImport = reader.getPageSize(page);
        float heightToImport = finder.getHeight();
        float maxHeight = pageSize.getHeight() - topMargin - bottomMargin;
        if (heightToImport > maxHeight)
        {
            throw new IllegalArgumentException(String.format("Page %s content too large; height: %s, limit: %s.", page, heightToImport, maxHeight));
        }

        if (heightToImport > yPosition - pageSize.getBottom(bottomMargin))
        {
            newPage();
        }
        else if (!writer.isPageEmpty())
        {
            heightToImport += gap;
        }
        yPosition -= heightToImport;

        PdfImportedPage importedPage = writer.getImportedPage(reader, page);
        writer.getDirectContent().addTemplate(importedPage, 0, yPosition - (finder.getLly() - pageSizeToImport.getBottom()));
    }

    Document document = null;
    PdfWriter writer = null;
    float yPosition = 0; 

    final Rectangle pageSize;
    final float topMargin;
    final float bottomMargin;
    final float gap;
}

如果您有 PdfReader 个实例 inputs 的列表,您可以像这样将它们合并到 OutputStream output:

PdfDenseMergeTool tool = new PdfDenseMergeTool(PageSize.A4, 18, 18, 5);
tool.merge(output, inputs);

这将使用 A4 页面大小创建一个合并文档,上下边距各为 18/72",不同 PDF 页面内容之间的间距为 5/72"。

评论

iText TextMarginFinder(在上面的 PdfDenseMergeTool 中使用)只考虑文本。如果还要考虑其他内容类型,则此 class 必须有所扩展。

Each PDF has just a few lines, perhaps a table or an image, but I want the end result in one page.

如果表格包含超出文本内容上方或下方的装饰(例如线条或彩色背景),您应该使用更大的间隙值。不幸的是,TextMarginFinder 使用的解析框架不会将矢量图形命令转发给查找器。

如果图像是位图图像,TextMarginFinder 应该通过实施其 renderImage 方法来扩展,以将图像区域也考虑在内。

Also, some of the PDFs may contain fields, so I'd like to keep those fields in the resulting combined PDF as well.

如果还要考虑 AcroForm 字段,则必须

  1. 扩展由 TextMarginFinder 表示的矩形,使其也包括小部件注释的可视化矩形,并且
  2. 扩展 PdfDenseMergeTool.merge(PdfReader, PdfReaderContentParser, int) 方法以复制那些小部件注释。

更新

我上面写了

Unfortunately the parsing framework used by the TextMarginFinder does not forward vector graphics commands to the finder.

同时(在版本 5.5.6 中)解析框架已扩展为也转发矢量图形命令。

如果换行

TextMarginFinder finder = parser.processContent(page, new TextMarginFinder());

来自

MarginFinder finder = parser.processContent(page, new MarginFinder());

使用 MarginFinder class presented at the bottom of this answer,所有内容都会被考虑,而不仅仅是文本。

对于那些想要用 C# 编写上述代码的人,给你。

using System;
using System.Collections.Generic;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;

namespace Test.WebService.Support {

  public class PDFMerge {

    private Rectangle PageSize;
    private float TopMargin;
    private float BottomMargin;
    private float Gap;
    private Document Document = null;
    private PdfWriter Writer = null;
    private float YPosition = 0;

    public PDFMerge(Rectangle size, float top, float bottom, float gap) {
      this.PageSize = size;
      this.TopMargin = top;
      this.BottomMargin = bottom;
      this.Gap = gap;
    } // PDFMerge

    public void Merge(MemoryStream outputStream, List<PdfReader> inputs) {
      try {
        this.OpenDocument(outputStream);

        foreach (PdfReader reader in inputs) {
          this.Merge(reader);
        }
      } finally {
        this.CloseDocument();
      }
    } // Merge

    private void Merge(PdfReader reader) {
      PdfReaderContentParser parser = new PdfReaderContentParser(reader);

      for (int p = 1; p <= reader.NumberOfPages; p++) {
        this.Merge(reader, parser, p);
      }
    } // Merge

    private void Merge(PdfReader reader, PdfReaderContentParser parser, int pageIndex) {
      TextMarginFinder Finder = parser.ProcessContent(pageIndex, new TextMarginFinder());
      Rectangle PageSizeToImport = reader.GetPageSize(pageIndex);
      float HeightToImport = Finder.GetHeight();
      float MaxHeight = PageSize.Height - TopMargin - BottomMargin;

      if (HeightToImport > MaxHeight) {
        throw new ArgumentException(string.Format("Page {0} content too large; height: {1}, limit: {2}.", pageIndex, HeightToImport, MaxHeight));
      }

      if (HeightToImport > YPosition - PageSize.GetBottom(BottomMargin)) {
        this.NewPage();
      } else if (!Writer.PageEmpty) {
        HeightToImport += Gap;
      }

      YPosition -= HeightToImport;

      PdfImportedPage ImportedPage = Writer.GetImportedPage(reader, pageIndex);
      Writer.DirectContent.AddTemplate(ImportedPage, 0, YPosition - (Finder.GetLly() - PageSizeToImport.Bottom));
    } // Merge

    private void OpenDocument(MemoryStream outputStream) {
      Document Document = new Document(PageSize, 36, 36, this.TopMargin, BottomMargin);
      PdfWriter Writer = PdfWriter.GetInstance(Document, outputStream);
      Document.Open();
      this.Document = Document;
      this.Writer = Writer;
      this.NewPage();
    } // OpenDocument

    private void CloseDocument() {
      try {
        Document.Close();
      } finally {
        this.Document = null;
        this.Writer = null;
        this.YPosition = 0;
      }
    } // CloseDocument

    private void NewPage() {
      Document.NewPage();
      YPosition = PageSize.GetTop(TopMargin);
    } // NewPage

  }
}