使用 openxml 拆分 docx 后，Word 在 xxx.docx 中发现不可读的内容

Question

我有一个full.docx，里面有两道数学题，docx里面嵌入了一些图片和MathType方程（oleobject），我按照this拆分了doc，得到两个文件（first.docx, second.docx) , first.docx 工作正常， second.docx 但是，当我尝试打开它时弹出警告对话框：

"Word found unreadable content in second.docx. Do you want to recover the contents of this document? If you trust the source of this document, click Yes."

点击"Yes"后可以打开文档，内容也正确，请问second.docx有什么问题？我已经用"Open xml sdk 2.5 productivity tool"查过了，没找到原因。非常感谢您的帮助。谢谢。

三个文件已上传至here。

显示一些代码：

        byte[] templateBytes = System.IO.File.ReadAllBytes(TEMPLATE_YANG_FILE);
        using (MemoryStream templateStream = new MemoryStream())
        {
            templateStream.Write(templateBytes, 0, (int)templateBytes.Length);

            string guidStr = Guid.NewGuid().ToString();

            using (WordprocessingDocument document = WordprocessingDocument.Open(templateStream, true))
            {
                document.ChangeDocumentType(DocumentFormat.OpenXml.WordprocessingDocumentType.Document);

                MainDocumentPart mainPart = document.MainDocumentPart;

                mainPart.Document = new Document();
                Body bd = new Body();

                foreach (DocumentFormat.OpenXml.Wordprocessing.Paragraph clonedParagrph in lst)
                {
                    bd.AppendChild<DocumentFormat.OpenXml.Wordprocessing.Paragraph>(clonedParagrph);

                    clonedParagrph.Descendants<Blip>().ToList().ForEach(blip =>
                    {
                        var newRelation = document.CopyImage(blip.Embed, this.wordDocument);
                        blip.Embed = newRelation;
                    });

                    clonedParagrph.Descendants<DocumentFormat.OpenXml.Vml.ImageData>().ToList().ForEach(imageData =>
                    {
                        var newRelation = document.CopyImage(imageData.RelationshipId, this.wordDocument);
                        imageData.RelationshipId = newRelation;
                    });
                }

                mainPart.Document.Body = bd;
                mainPart.Document.Save();
            }

            string subDocFile = System.IO.Path.Combine(this.outDir, guidStr + ".docx");
            this.subWordFileLst.Add(subDocFile);

            File.WriteAllBytes(subDocFile, templateStream.ToArray());
        }

第一个包含使用以下方法从原始 docx 克隆的段落：

(DocumentFormat.OpenXml.Wordprocessing.Paragraph)p.Clone();

Answer 1

使用生产力工具，发现oleobjectx.bin没有复制，所以我在复制Blip和ImageData后添加以下代码：

clonedParagrph.Descendants<OleObject>().ToList().ForEach(ole =>
{
    var newRelation = document.CopyOleObject(ole.Id, this.wordDocument);
    ole.Id = newRelation;
});

问题已解决。

使用 openxml 拆分 docx 后，Word 在 xxx.docx 中发现不可读的内容

Word found unreadable content in xxx.docx after split a docx using openxml

ms-word

openxml

openxml-sdk