更改现有 .docx 文件的纯文本内容控件的默认文本

Question

我得到了一个 .docx 模板，我需要将其填充到我的 java 应用程序中。最初，我打算使用 Apache POI，因为在此之前，我的任务是填写一个 .xlsx 模板并且它运行良好。但是，根据我的研究，doc4j 更适合我的情况。

我的情况是这个 .docx 模板使用这样的纯文本内容控件：

现在，在检查其 XML 结构后，我看到 <w:body> 标签正下方的 <w:p> 正下方 <w:sdt>。

<w:body>
    ...
    <w:p w:rsidR="00ED05E8" w:rsidRPr="00DA4BE7" w:rsidRDefault="00AC5B37" w:rsidP="00BA6F7F">
        ...
        <w:sdt>
            <w:sdtPr>
                <w:rPr>
                    <w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>
                    <w:i/>
                    <w:sz w:val="24"/>
                    <w:szCs w:val="24"/>
                    <w:u w:val="single"/>
                </w:rPr>
                <w:alias w:val="Name of Office/Agency Name"/>
                <w:tag w:val="Name of Office/Agency Name"/>
                <w:id w:val="-781645881"/>
                <w:placeholder>
                    <w:docPart w:val="DefaultPlaceholder_-1854013440"/>
                </w:placeholder>
                <w:text/>
            </w:sdtPr>
            <w:sdtEndPr/>
            <w:sdtContent>
                <w:r w:rsidR="00340180" w:rsidRPr="00616BA5">
                    <w:rPr>
                        <w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>
                        <w:i/>
                        <w:sz w:val="24"/>
                        <w:szCs w:val="24"/>
                        <w:u w:val="single"/>
                    </w:rPr>
                    <w:t>(Name of Office/Agency Name)</w:t>
                </w:r>
            </w:sdtContent>
        </w:sdt>
        ...
</w:body>

我想将 <w:sdt> 的 <w:t> 上的文本从“（代理机构名称）”更改为不同的字符串。问题是我不知道如何在这些行之后被卡住：

WordprocessingMLPackage document = WordprocessingMLPackage.load(new java.io.File(...));
MainDocumentPart mainDocument = document.getMainDocumentPart();

我有 w:id 个 -781645881，但我不知道如何处理这些信息。这甚至是 ContentControlsXmlEdit sample class from the docx4j site 中提到的 itemId 吗？

即使使用以下代码，我也无法获取 <w:sdt> 节点：

String itemId = "-781645881".toLowerCase();
CustomXmlDataStoragePart customXmlDataStoragePart = (CustomXmlDataStoragePart)wordMLPackage.getCustomXmlDataStorageParts().get(itemId);
CustomXmlDataStorage customXmlDataStorage = customXmlDataStoragePart.getData();

我应该怎么做才能更改纯文本内容控件的值？

Answer 1

这个答案是我出于以下原因在绝望中想出的：

我还不太掌握使用 docx4j.

.xml

我提取了我正在处理的 .docx 文件的确切 .xml 文件。
在我的 .docx 文件的 .xml 中找不到 storeItemid。

这是我的实用程序 class 写在 .groovy:

import javax.xml.bind.JAXBElement
import org.apache.poi.openxml4j.exceptions.InvalidFormatException
import org.docx4j.openpackaging.packages.WordprocessingMLPackage
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart
import org.docx4j.wml.CTBookmark
import org.docx4j.wml.P
import org.docx4j.wml.R
import org.docx4j.wml.SdtBlock
import org.docx4j.wml.SdtContent
import org.docx4j.wml.SdtRun
import org.docx4j.wml.Text

class WordReport {
    private WordprocessingMLPackage document
    private Map<String, String> contentControlMapping
    private Map<String, Object> reportArgs

    public WordReport(Map<String, Object> reportArgs) {
        document = WordprocessingMLPackage.createPackage()
        this.reportArgs = reportArgs
    }

    public WordprocessingMLPackage exportReport() {
        return document
    }

    private String getNewMapping(String contentControlText)  {
        return contentControlMapping.get(contentControlText)
    }

    private boolean isMapped(String contentControlText) {
        return contentControlMapping.containsKey(contentControlText)
    }

    protected void mapNewMapping() {
        MainDocumentPart mainDocument = document.getMainDocumentPart()
        List<Object> nodes = mainDocument.getJAXBNodesViaXPath("//w:sdt", false)

        String key
        SdtContent content
        nodes.each { n ->
            if(n instanceof SdtBlock) {
                content = n.getSdtContent()
            }
            else if(n instanceof JAXBElement) {
                if(n.getValue() instanceof SdtRun) {
                    content = n.getValue().getSdtContent()
                }
            }

            content.getContent().each { sdtcc ->
                if(sdtcc instanceof P) {
                    sdtcc.getContent().each { pc ->
                        pc.getContent().each { rc ->
                            println "rc.getValue().getClass(): " + rc.getValue().getClass()
                            if(rc.getValue() instanceof Text) {
                                key = rc.getValue().getValue()
                                isMapped(key) ? rc.getValue().setValue(getNewMapping(key)) : null
                            }
                            else if(rc.getValue() instanceof R) {
                                rc.getValue().getContent().each { rrc ->
                                    if(rrc instanceof JAXBElement) {
                                        key = rrc.getValue().getValue()
                                        isMapped(key) ? rrc.getValue().setValue(getNewMapping(key)) : null
                                    }
                                }
                            }
                        }
                    }
                }
                else if(sdtcc instanceof R) {
                    sdtcc.getContent().each { rc ->
                        if(rc instanceof JAXBElement) {
                            key = rc.getValue().getValue()
                            isMapped(key) ? rc.getValue().setValue(getNewMapping(key)) : null
                        }
                    }
                }
                else if(sdtcc instanceof JAXBElement) {
                    if(sdtcc.getValue() instanceof CTBookmark) {

                    }
                    else if(sdtcc.getValue() instanceof JAXBElement) {
                        key = sdtcc.getValue().getValue()
                        isMapped(key) ? sdtcc.getValue().setValue(getNewMapping(key)) : null
                    }
                }
            }
        }
    }

    public void setMapping(Map contentControlMapping) {
        this.contentControlMapping = contentControlMapping
    }
}

这个class的核心部分是mapNewMapping()方法。它基本上所做的是将 contentControlMapping 变量上的映射映射到 <w:sdt> 内的任何 <w:t>，无论它是直接在 <w:sdt> 下还是在 <w:rPr>，等等。我使用 getJAXBNodesViaXPath() 方法检索所有 <w:sdt> 的列表。

它的局限性是它只能支持有限的组合 P、R、CTBookmark、SdtBlock、SdtContent、SdtRun。如果 <w:t> 出现在我没有预料到的复杂或深层嵌套的 .xml 中，它将不会被映射。这就是为什么我提到我首先阅读了 .docx 文件的 .xml。

更改现有 .docx 文件的纯文本内容控件的默认文本

Changing Default text of a Plain Text Content Control of a existing .docx file

java

ms-office

docx4j