如何使用 openxml 合并具有不同 headers 的 word 文档?

How to merge word documents with different headers using openxml?

我正在尝试按照 post 在另一个 post 中编写的示例将多个文档合并为一个文档。 我正在使用 AltChunk altChunk = new AltChunk()。合并文档时,似乎不会保留每个文档的单独听众。合并后的文档将包含合并期间第一个文档的 headers。如果被合并的第一个文档不包含听众,那么新合并文档的所有其余部分将不包含 headers,反之亦然。

我的问题是,如何保留正在合并的 headers 个不同的文档?

Merge multiple word documents into one Open Xml

using System;
using System.IO;
using System.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

namespace WordMergeProject
{
    public class Program
    {
        private static void Main(string[] args)
        {
            byte[] word1 = File.ReadAllBytes(@"..\..\word1.docx");
            byte[] word2 = File.ReadAllBytes(@"..\..\word2.docx");

            byte[] result = Merge(word1, word2);

            File.WriteAllBytes(@"..\..\word3.docx", result);
        }

        private static byte[] Merge(byte[] dest, byte[] src)
        {
            string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString();

            var memoryStreamDest = new MemoryStream();
            memoryStreamDest.Write(dest, 0, dest.Length);
            memoryStreamDest.Seek(0, SeekOrigin.Begin);
            var memoryStreamSrc = new MemoryStream(src);

            using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStreamDest, true))
            {
                MainDocumentPart mainPart = doc.MainDocumentPart;
                AlternativeFormatImportPart altPart =
                    mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, altChunkId);
                altPart.FeedData(memoryStreamSrc);
                var altChunk = new AltChunk();
                altChunk.Id = altChunkId;
                              OpenXmlElement lastElem = mainPart.Document.Body.Elements<AltChunk>().LastOrDefault();
            if(lastElem == null)
            {
                lastElem = mainPart.Document.Body.Elements<Paragraph>().Last();
            }


            //Page Brake einfügen
            Paragraph pageBreakP = new Paragraph();
            Run pageBreakR = new Run();
            Break pageBreakBr = new Break() { Type = BreakValues.Page };

            pageBreakP.Append(pageBreakR);
            pageBreakR.Append(pageBreakBr);                

            return memoryStreamDest.ToArray();
        }
    }
}

我遇到了这个问题few years ago and spent quite some time on it; I eventually wrote a blog article,它链接到一个示例文件。使用 header 实现文件和页脚的集成 Alt-Chunk 不是 straight-forward。我将在这里尝试涵盖要点。根据 header 和页脚包含的内容类型(假设微软没有解决我最初 运行 遇到的任何问题)可能无法仅依赖 AltChunk。

(另请注意,可能有 Tools/APIs 可以处理此问题 - 我不知道,在本网站上询问会是 off-topic。)

背景

在解决问题之前,了解 Word 如何处理不同的 header 和页脚会有所帮助。要感受一下,请启动 Word...

分节符/取消链接 headers/footers

  • 在页面上键入一些文本并插入 header
  • 将焦点移动到页面末尾并转到功能区中的 Page Layout 选项卡
  • 页Setup/Breaks/Next页分节符
  • 进入此页面的 Header 区域并记下蓝色 "tags" 中的信息:您将在左侧看到一个部分标识符,在右侧看到 "Same as previous"。 "Same as Previous" 是默认值,要创建不同的 Header 单击 Header
  • 中的 "Link to Previous" 按钮

所以,规则是:

a section break is required, with unlinked headers (and/or footers), in order to have different header/footer content within a document.

Master/Sub-documents

Word 有一个(不)著名的功能,称为 "Master Document",它允许 外部("sub")文档链接到 "master" 文档。这样做会自动添加必要的分节符并取消链接 headers/footers,以便保留原件。

  • 转到 Word 的大纲视图
  • 点击"Show Document"
  • 使用"Insert"插入其他文件

注意插入了 两个 分节符,一个是 "Next page" 类型,另一个是 "Continuous"。第一个是插入进来的文件; "master" 文件中的第二个。

Two section breaks are necessary when inserting a file because the last paragraph mark (which contains the section break for the end of the document) is not carried over to the target document. The section break in the target document carries the information to unlink the in-coming header from those already in the target document.

当主文档被保存、关闭并且 re-opened 子文档处于 "collapsed" 状态(文件名作为超链接而不是内容)。可以通过返回大纲视图并单击 "Expand" 按钮来展开它们。要将 sub-document 完全合并到文档中,请单击左上角 sub-document 旁边的图标,然后单击 "Unlink"。

合并 Word 打开的 XML 个文件

这就是 Open XML SDK 在合并需要保留 header 和页脚的文件时需要创建的环境类型。从理论上讲,这两种方法都应该有效。实际上,我在只使用分节符时遇到了问题;我从未在 Word Open XML.

中使用主文档功能进行过测试

插入分节符

这是在使用 AltChunk 引入文件之前插入分节符和取消链接 header 的基本代码。看看我以前的帖子和文章,只要不涉及复杂的页码,就可以:

private void btnMergeWordDocs_Click(object sender, EventArgs e)
{
    string sourceFolder = @"C:\Test\MergeDocs\";
    string targetFolder = @"C:\Test\";

    string altChunkIdBase = "acID";
    int altChunkCounter = 1;
    string altChunkId = altChunkIdBase + altChunkCounter.ToString();

    MainDocumentPart wdDocTargetMainPart = null;
    Document docTarget = null;
    AlternativeFormatImportPartType afType;
    AlternativeFormatImportPart chunk = null;
    AltChunk ac = null;
    using (WordprocessingDocument wdPkgTarget = WordprocessingDocument.Create(targetFolder + "mergedDoc.docx", DocumentFormat.OpenXml.WordprocessingDocumentType.Document, true))
    {
        //Will create document in 2007 Compatibility Mode.
        //In order to make it 2010 a Settings part must be created and a CompatMode element for the Office version set.
        wdDocTargetMainPart = wdPkgTarget.MainDocumentPart;
        if (wdDocTargetMainPart == null)
        {
            wdDocTargetMainPart = wdPkgTarget.AddMainDocumentPart();
            Document wdDoc = new Document(
                new Body(
                    new Paragraph(
                        new Run(new Text() { Text = "First Para" })),
                        new Paragraph(new Run(new Text() { Text = "Second para" })),
                        new SectionProperties(
                            new SectionType() { Val = SectionMarkValues.NextPage },
                            new PageSize() { Code = 9 },
                            new PageMargin() { Gutter = 0, Bottom = 1134, Top = 1134, Left = 1318, Right = 1318, Footer = 709, Header = 709 },
                            new Columns() { Space = "708" },
                            new TitlePage())));
            wdDocTargetMainPart.Document = wdDoc;
        }
        docTarget = wdDocTargetMainPart.Document;
        SectionProperties secPropLast = docTarget.Body.Descendants<SectionProperties>().Last();
        SectionProperties secPropNew = (SectionProperties)secPropLast.CloneNode(true);
        //A section break must be in a ParagraphProperty
        Paragraph lastParaTarget = (Paragraph)docTarget.Body.Descendants<Paragraph>().Last();
        ParagraphProperties paraPropTarget = lastParaTarget.ParagraphProperties;
        if (paraPropTarget == null)
        {
            paraPropTarget = new ParagraphProperties();
        }
        paraPropTarget.Append(secPropNew);
        Run paraRun = lastParaTarget.Descendants<Run>().FirstOrDefault();
        //lastParaTarget.InsertBefore(paraPropTarget, paraRun);
        lastParaTarget.InsertAt(paraPropTarget, 0);

        //Process the individual files in the source folder.
        //Note that this process will permanently change the files by adding a section break.
        System.IO.DirectoryInfo di = new System.IO.DirectoryInfo(sourceFolder);
        IEnumerable<System.IO.FileInfo> docFiles = di.EnumerateFiles();
        foreach (System.IO.FileInfo fi in docFiles)
        {
            using (WordprocessingDocument pkgSourceDoc = WordprocessingDocument.Open(fi.FullName, true))
            {
                IEnumerable<HeaderPart> partsHeader = pkgSourceDoc.MainDocumentPart.GetPartsOfType<HeaderPart>();
                IEnumerable<FooterPart> partsFooter = pkgSourceDoc.MainDocumentPart.GetPartsOfType<FooterPart>();
                //If the source document has headers or footers we want to retain them.
                //This requires inserting a section break at the end of the document.
                if (partsHeader.Count() > 0 || partsFooter.Count() > 0)
                {
                    Body sourceBody = pkgSourceDoc.MainDocumentPart.Document.Body;
                    SectionProperties docSectionBreak = sourceBody.Descendants<SectionProperties>().Last();
                    //Make a copy of the document section break as this won't be imported into the target document.
                    //It needs to be appended to the last paragraph of the document
                    SectionProperties copySectionBreak = (SectionProperties)docSectionBreak.CloneNode(true);
                    Paragraph lastpara = sourceBody.Descendants<Paragraph>().Last();
                    ParagraphProperties paraProps = lastpara.ParagraphProperties;
                    if (paraProps == null)
                    {
                        paraProps = new ParagraphProperties();
                        lastpara.Append(paraProps);
                    }
                    paraProps.Append(copySectionBreak);
                }
                pkgSourceDoc.MainDocumentPart.Document.Save();
            }
            //Insert the source file into the target file using AltChunk
            afType = AlternativeFormatImportPartType.WordprocessingML;
            chunk = wdDocTargetMainPart.AddAlternativeFormatImportPart(afType, altChunkId);
            System.IO.FileStream fsSourceDocument = new System.IO.FileStream(fi.FullName, System.IO.FileMode.Open);
            chunk.FeedData(fsSourceDocument);
            //Create the chunk
            ac = new AltChunk();
            //Link it to the part
            ac.Id = altChunkId;
            docTarget.Body.InsertAfter(ac, docTarget.Body.Descendants<Paragraph>().Last());
            docTarget.Save();
            altChunkCounter += 1;
            altChunkId = altChunkIdBase + altChunkCounter.ToString();
            chunk = null;
            ac = null;
        }
    }
}

如果有复杂的页码(引用自我的博客文章):

Unfortunately, there’s a bug in the Word application when integrating Word document “chunks” into the main document. The process has the nasty habit of not retaining a number of SectionProperties, among them the one that sets whether a section has a Different First Page () and the one to restart Page Numbering () in a section. As long as your documents don’t need to manage these kinds of headers and footers you can probably use the “altChunk” approach.

But if you do need to handle complex headers and footers the only method currently available to you is to copy in the each document in its entirety, part-by-part. This is a non-trivial undertaking, as there are numerous possible types of Parts that can be associated not only with the main document body, but also with each header and footer part.

...或尝试 Master/Sub 文档方法。

Master/Sub 文档

这种方法肯定会保留所有信息,它会作为主文件打开,但是,Word API(用户或自动化代码)需要"unlink" sub-documents 将其变成一个完整的文档。

在 Open XML SDK Productivity Tool 中打开主文档文件显示将子文档插入主文档是一个相当 straight-forward 的过程:

底层 Word 打开 XML 文档 sub-document:

<w:body xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:p>
    <w:pPr>
      <w:pStyle w:val="Heading1" />
    </w:pPr>
    <w:subDoc r:id="rId6" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" />
  </w:p>
  <w:sectPr>
    <w:headerReference w:type="default" r:id="rId7" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" />
    <w:type w:val="continuous" />
    <w:pgSz w:w="11906" w:h="16838" />
    <w:pgMar w:top="1417" w:right="1417" w:bottom="1134" w:left="1417" w:header="708" w:footer="708" w:gutter="0" />
    <w:cols w:space="708" />
    <w:docGrid w:linePitch="360" />
  </w:sectPr>
</w:body>

和代码:

public class GeneratedClass
{
    // Creates an Body instance and adds its children.
    public Body GenerateBody()
    {
        Body body1 = new Body();

        Paragraph paragraph1 = new Paragraph();

        ParagraphProperties paragraphProperties1 = new ParagraphProperties();
        ParagraphStyleId paragraphStyleId1 = new ParagraphStyleId(){ Val = "Heading1" };

        paragraphProperties1.Append(paragraphStyleId1);
        SubDocumentReference subDocumentReference1 = new SubDocumentReference(){ Id = "rId6" };

        paragraph1.Append(paragraphProperties1);
        paragraph1.Append(subDocumentReference1);

        SectionProperties sectionProperties1 = new SectionProperties();
        HeaderReference headerReference1 = new HeaderReference(){ Type = HeaderFooterValues.Default, Id = "rId7" };
        SectionType sectionType1 = new SectionType(){ Val = SectionMarkValues.Continuous };
        PageSize pageSize1 = new PageSize(){ Width = (UInt32Value)11906U, Height = (UInt32Value)16838U };
        PageMargin pageMargin1 = new PageMargin(){ Top = 1417, Right = (UInt32Value)1417U, Bottom = 1134, Left = (UInt32Value)1417U, Header = (UInt32Value)708U, Footer = (UInt32Value)708U, Gutter = (UInt32Value)0U };
        Columns columns1 = new Columns(){ Space = "708" };
        DocGrid docGrid1 = new DocGrid(){ LinePitch = 360 };

        sectionProperties1.Append(headerReference1);
        sectionProperties1.Append(sectionType1);
        sectionProperties1.Append(pageSize1);
        sectionProperties1.Append(pageMargin1);
        sectionProperties1.Append(columns1);
        sectionProperties1.Append(docGrid1);

        body1.Append(paragraph1);
        body1.Append(sectionProperties1);
        return body1;
    }
}