使用 TOC 元素合并 PDF 文件

Merge PDF files with TOC element

我正在合并 PDF 文件,使用 GemBox.Pdf 作为 shown here。这很好用,我可以轻松添加轮廓。

我之前做过类似的事情,将 Word 文件与 GemBox.Document 合并为 shown here

但现在我的问题是 GemBox.Pdf 中没有 TOC 元素。我想在将多个 PDF 文件合并为一个文件时自动获得 Table 个目录。

我是不是遗漏了什么或者真的没有这样的 PDF 元素?
我需要重新创建它吗?如果是,我该怎么做?
我可以添加书签,但我不知道如何添加link。

PDF文件中没有这样的元素,所以我们需要自己创建这个内容。

现在,一种方法是创建文本元素、轮廓和 link 注释,将它们适当放置,然后将 link 目标设置为轮廓。

但是,这可能需要完成一些工作,因此使用 GemBox.Document 创建所需的 TOC 元素,将其保存为 PDF 文件,然后将其导入到生成的 PDF 中可能会更容易。

// Source data for creating TOC entries with specified text and associated PDF files.
var pdfEntries = new[]
{
    new { Title = "First Document Title", Pdf = PdfDocument.Load("input1.pdf") },
    new { Title = "Second Document Title", Pdf = PdfDocument.Load("input2.pdf") },
    new { Title = "Third Document Title", Pdf = PdfDocument.Load("input3.pdf") },
};

/***************************************************************/
/* Create new document with TOC element using GemBox.Document. */
/***************************************************************/

// Create new document.
var tocDocument = new DocumentModel();
var section = new Section(tocDocument);
tocDocument.Sections.Add(section);

// Create and add TOC element.
var toc = new TableOfEntries(tocDocument, FieldType.TOC);
section.Blocks.Add(toc);
section.Blocks.Add(new Paragraph(tocDocument, new SpecialCharacter(tocDocument, SpecialCharacterType.PageBreak)));

// Create heading style.
// By default, when updating TOC element a TOC entry is created for each paragraph that has heading style.
var heading1Style = (ParagraphStyle)tocDocument.Styles.GetOrAdd(StyleTemplateType.Heading1);

// Add heading and empty (placeholder) pages.
// The number of added placeholder pages depend on the number of pages that actual PDF file has so that TOC entries have correct page numbers.
int totalPageCount = 0;
foreach (var pdfEntry in pdfEntries)
{
    section.Blocks.Add(new Paragraph(tocDocument, pdfEntry.Title) { ParagraphFormat = { Style = heading1Style } });
    section.Blocks.Add(new Paragraph(tocDocument, new SpecialCharacter(tocDocument, SpecialCharacterType.PageBreak)));

    int currentPageCount = pdfEntry.Pdf.Pages.Count;
    totalPageCount += currentPageCount;

    while (--currentPageCount > 0)
        section.Blocks.Add(new Paragraph(tocDocument, new SpecialCharacter(tocDocument, SpecialCharacterType.PageBreak)));
}

// Remove last extra-added empty page.
section.Blocks.RemoveAt(section.Blocks.Count - 1);

// Update TOC element and save the document as PDF stream.
toc.Update();
var pdfStream = new MemoryStream();
tocDocument.Save(pdfStream, new GemBox.Document.PdfSaveOptions());

/***************************************************************/
/* Merge PDF files into PDF with TOC element using GemBox.Pdf. */
/***************************************************************/

// Load a PDF stream using GemBox.Pdf.
var pdfDocument = PdfDocument.Load(pdfStream);
var rootDictionary = (PdfDictionary)((PdfIndirectObject)pdfDocument.GetDictionary()[PdfName.Create("Root")]).Value;
var pagesDictionary = (PdfDictionary)((PdfIndirectObject)rootDictionary[PdfName.Create("Pages")]).Value;
var kidsArray = (PdfArray)pagesDictionary[PdfName.Create("Kids")];
var pageIds = kidsArray.Cast<PdfIndirectObject>().Select(obj => obj.Id).ToArray();

// Remove empty (placeholder) pages.
while (totalPageCount-- > 0)
    pdfDocument.Pages.RemoveAt(pdfDocument.Pages.Count - 1);

// Add pages from PDF files.
foreach (var pdfEntry in pdfEntries)
    foreach (var page in pdfEntry.Pdf.Pages)
        pdfDocument.Pages.AddClone(page);

/*****************************************************************************/
/* Update TOC links from placeholder pages to actual pages using GemBox.Pdf. */
/*****************************************************************************/

// Create a mapping from an ID of a empty (placeholder) page indirect object to an actual page indirect object.
var pageCloneMap = new Dictionary<PdfIndirectObjectIdentifier, PdfIndirectObject>();
for (int i = 0; i < kidsArray.Count; ++i)
    pageCloneMap.Add(pageIds[i], (PdfIndirectObject)kidsArray[i]);

foreach (var entry in pageCloneMap)
{
    // If page was updated, it means that we passed TOC pages, so break from the loop.
    if (entry.Key != entry.Value.Id)
        break;

    // For each TOC page, get its 'Annots' entry.
    // For each link annotation from the 'Annots' get the 'Dest' entry.
    // Update the first item in the 'Dest' array so that it no longer points to a removed page.
    if (((PdfDictionary)entry.Value.Value).TryGetValue(PdfName.Create("Annots"), out PdfBasicObject annotsObj))
        foreach (PdfIndirectObject annotObj in (PdfArray)annotsObj)
            if (((PdfDictionary)annotObj.Value).TryGetValue(PdfName.Create("Dest"), out PdfBasicObject destObj))
            {
                var destArray = (PdfArray)destObj;
                destArray[0] = pageCloneMap[((PdfIndirectObject)destArray[0]).Id];
            }
}

// Save resulting PDF file.
pdfDocument.Save("Result.pdf");
pdfDocument.Close();

这样您就可以使用 TOC 开关和样式轻松自定义 TOC 元素。有关详细信息,请参阅 GemBox.Document 中的 Table Of Content example