使用 AltChunks 的合并文档的内部文本为空

Question

我正在尝试将多个文档合并为一个文档，然后打开结果文档并进一步处理它。

"ChunkId"是一个属性，每次调用此方法时都会增加，以获得唯一的id。我遵循了 this site 中的示例。这是用于合并多个文档的代码（使用 altchunks）： `

private void MergeDocument(string mergePath, bool appendPageBreak)
    {
        if (!File.Exists(mergePath))
        {
            Log.Warn(string.Format("Document: \"{0}\" was not found.", mergePath));
            return;
        }

        ChunkId++;
        var altChunkId = "AltChunkId" + ChunkId;

        var mainDocPart = DestinationDocument.MainDocumentPart;
        if (mainDocPart == null)
        {
            DestinationDocument.AddMainDocumentPart();
            mainDocPart = DestinationDocument.MainDocumentPart;
            if (mainDocPart.Document == null)
                mainDocPart.Document = new Document { Body = new Body() };
        }

        try
        {
            var chunk = mainDocPart.AddAlternativeFormatImportPart(
                AlternativeFormatImportPartType.WordprocessingML, altChunkId);
            if (chunk != null)
                using (var ms = new FileStream(mergePath, FileMode.Open))
                {
                    chunk.FeedData(ms);
                }
            else
            {
                Log.Error(string.Format("Merge - Failed to create chunk document based on \"{0}\".", mergePath));
                return; // failed to create chunk document, return from merge method

            }
        }
        catch (Exception e)
        {
            Log.Error(string.Format("Merge - Failed to insert chunk document based on \"{0}\".", mergePath));
            return; // failed to create chunk document, return from merge method

        }

        var altChunk = new AltChunk { Id = altChunkId };

        //append the page break
        if (appendPageBreak)
            try
            {
                AppendPageBreak(mainDocPart);
                Log.Info(string.Format("Successfully appended page break."));
            }
            catch (Exception ex)
            {
                Log.Error(string.Format("Eror appending page break. Message: \"{0}\".", ex.Message));
                return; // return if page break insertion failed
            }

        // insert the document 
        var last = mainDocPart.Document
        .Body
        .Elements()
        .LastOrDefault(e => e is Paragraph || e is AltChunk);
        try
        {
            if (last == null)
                mainDocPart.Document.Body.InsertAt(altChunk, 0);
            else
                last.InsertAfterSelf(altChunk);
            Log.Info(string.Format("Successfully inserted new doc \"{0}\" into destination.", mergePath));
        }
        catch (Exception ex)
        {
            Log.Error(string.Format("Error merging document \"{0}\". Message: \"{1}\".", mergePath, ex.Message));
            return; // return if the merge was not successfull
        }

        try
        {
            mainDocPart.Document.Save();
        }
        catch (Exception ex)
        {
            Log.Error(string.Format("Error saving document \"{0}\". Message: \"{1}\".", mergePath, ex.Message));
        }
    }`

如果我用 Word 打开合并的文档，我可以看到它的内容（表格、文本、段落..），但是如果我再次从代码中打开它，它会说内部文本是“”（空字符串）。我需要该内部文本来反映文档包含的内容，因为我必须用其他文本替换一些占位符，如“@@name@@”，如果内部文本为空，我不能。

这是合并文档的innerxml，

这是我打开合并文档的方式：

DestinationDocument = WordprocessingDocument.Open(Path.GetFullPath(destinationPath), true);

如何阅读文档的内部文本？或者我怎样才能将这些文件合并成一个文件，这样这个问题就不会再发生了？

Answer 1

当文档与 AltChunks 合并时，它就像原始 word 文档的嵌入式附件。客户端 (MS Word) 处理 altchunk 部分的呈现。因此，生成的文档不会具有合并文档的 openxml 标记。

如果您想将生成的文档用于进一步的编程 post-处理，请使用 Openxml Power Tools。请参考我的回答here

Openxml powertools - https://github.com/OfficeDev/Open-Xml-PowerTools

Answer 2

问题是文档并没有真正合并（本身），altChunk 元素只定义了替代内容应该放在文档中的地方，它有对该替代内容的引用。
当您在 MS Word 中打开此文档时，它实际上会自动为您合并所有这些备选内容。因此，当您使用 MS Word 重新保存该文档时，您将不再有 altChunk 元素。

尽管如此，您实际上可以像处理 main[=28= 一样操作那些 altChunk DOCX 文件（子 DOCX 文档） ] DOCX 文件（父文档）。

例如：

string destinationPath = "Sample.docx"; string search = "@@name@@"; string replace ="John Doe"; using (var parent = WordprocessingDocument.Open(Path.GetFullPath(destinationPath), true)) { foreach (var altChunk in parent.MainDocumentPart.GetPartsOfType<AlternativeFormatImportPart>()) { if (Path.GetExtension(altChunk.Uri.OriginalString) != ".docx") continue; using (var child = WordprocessingDocument.Open(altChunk.GetStream(), true)) { var foundText = child.MainDocumentPart.Document.Body .Descendants<Text>() .Where(t => t.Text.Contains(search)) .FirstOrDefault(); if (foundText != null) { foundText.Text = foundText.Text.Replace(search, replace); break; } } } }

或者，您需要使用某种方法来真正合并这些文档。 Flowerking 提到了一种解决方案，您可以尝试的另一种解决方案是 GemBox.Document 库。它会在加载时为您合并这些备选内容（就像 MS Word 打开时所做的那样）。

例如：

string destinationPath = "Sample.docx"; string search = "@@name@@"; string replace = "John Doe"; DocumentModel document = DocumentModel.Load(destinationPath); ContentRange foundText = document.Content.Find(search).FirstOrDefault(); if (foundText != null) foundText.LoadText(replace); document.Save(destinationPath);

使用 AltChunks 的合并文档的内部文本为空

Merged document using AltChunks has innertext empty

c#

ms-word

openxml

openxml-sdk

wordprocessingml