在 dotx/docx 文件中编辑 header

Question

我目前正在尝试从现有的 dotx 格式模板生成一个新的 docx 文件。我想更改 header 中的名字、姓氏等，但由于某种原因我无法访问它们... 我的方法如下：

 public void generateDocX(Long id) throws IOException, InvalidFormatException {

    //Get user per id
    EmployeeDTO employeeDTO = employeeService.getEmployee(id);

    //Location where the new docx file will be saved
    FileOutputStream outputStream = new FileOutputStream(new File("/home/user/Documents/project/src/main/files/" + employeeDTO.getId() + "header.docx"));

    //Get the template for generating the new docx file
    File template = new File("/home/user/Documents/project/src/main/files/template.dotx");
    OPCPackage pkg = OPCPackage.open(template);
    XWPFDocument document = new XWPFDocument(pkg);

    for (XWPFHeader header : document.getHeaderList()) {
        List<XWPFParagraph> paragraphs = header.getParagraphs();
        System.out.println("Total paragraphs in header are: " + paragraphs.size());
        System.out.println("Total elements in the header are: " + header.getBodyElements().size());
        for (XWPFParagraph paragraph : paragraphs) {
            System.out.println("Paragraph text is: " + paragraph.getText());
            List<XWPFRun> runs = paragraph.getRuns();
            for (XWPFRun run : runs) {
                String runText = run.getText(run.getTextPosition());
                System.out.println("Run text is: " + runText);
            }
        }
    }

    //Write the changes to the new docx file and close the document
    document.write(outputStream);
    document.close();
}

控制台中的输出是 1、null 或空字符串...我尝试了 , here and here 中的几种方法，但没有任何运气...

这是 template.dotx

里面的内容

Answer 1

IBody.getParagraphs and IBody.getBodyElements- 只获取直接在 IBody 中的段落或 body 元素。但是您的段落不直接在其中，而是在单独的文本框或文本框中。这就是为什么他们不能通过这种方式获得。

由于 *.docx 是一个 ZIP 存档，包含文档、header 和页脚的 XML 文件，因此可以获得一个 [=14= 的所有文本运行] 通过创建 XmlCursor 选择所有 w:r XML 元素。对于 XWPFHeader 这可能看起来像这样：

 private List<XmlObject> getAllCTRs(XWPFHeader header) {
  CTHdrFtr ctHdrFtr = header._getHdrFtr();
  XmlCursor cursor = ctHdrFtr.newCursor();
  cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
  List<XmlObject> ctrInHdrFtr = new ArrayList<XmlObject>();
  while (cursor.hasNextSelection()) {
   cursor.toNextSelection();
   XmlObject obj = cursor.getObject();
   ctrInHdrFtr.add(obj);
  }
  return ctrInHdrFtr;
 }

现在我们有一个包含 header 中所有 XML 个元素的列表，这些元素是 Word 中的 text-run-elements 个。

我们可以有一个更通用的 getAllCTRs，它从任何类型的 IBody 中获取所有 CTR 元素，如下所示：

 private List<XmlObject> getAllCTRs(IBody iBody) {
  XmlCursor cursor = null;
  List<XmlObject> ctrInIBody = new ArrayList<XmlObject>();

  if (iBody instanceof XWPFHeaderFooter) {
   XWPFHeaderFooter headerFooter = (XWPFHeaderFooter)iBody;
   CTHdrFtr ctHdrFtr = headerFooter._getHdrFtr();
   cursor = ctHdrFtr.newCursor();
  } else if (iBody instanceof XWPFDocument) {
   XWPFDocument document = (XWPFDocument)iBody;
   CTDocument1 ctDocument1 = document.getDocument();
   cursor = ctDocument1.newCursor();
  } else if (iBody instanceof XWPFAbstractFootnoteEndnote) {
   XWPFAbstractFootnoteEndnote footEndnote = (XWPFAbstractFootnoteEndnote)iBody;
   CTFtnEdn ctFtnEdn = footEndnote.getCTFtnEdn();
   cursor = ctFtnEdn.newCursor();
  }

  if (cursor != null) {
   cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
   while (cursor.hasNextSelection()) {
    cursor.toNextSelection();
    XmlObject obj = cursor.getObject();
    ctrInIBody.add(obj);
   }
  }
  return ctrInIBody ;
 }

现在我们有一个包含 IBody 中所有 XML 元素的列表，这些元素是 Word 中的 text-run-elements。

有了这些我们就可以像这样从中获取文本：

 private void printAllTextInTextRunsOfIBody(IBody iBody) throws Exception {
  List<XmlObject> ctrInIBody = getAllCTRs(iBody);
  for (XmlObject obj : ctrInIBody) {
   CTR ctr = CTR.Factory.parse(obj.xmlText());
   for (CTText ctText : ctr.getTList()) {
    String text = ctText.getStringValue();
    System.out.println(text);
   }
  }
 }

这可能显示了下一个挑战。因为Word在创建text-run-elements的时候很乱。例如，您的占位符 <<Firstname>> 可以拆分为 text-runs << + Firstname + >>。原因可能是不同的格式或拼写检查或其他原因。即使这样也是可能的：<< + Lastname + >>; << + YearOfBirth + >>。甚至这样：<<Firstname + >> << + Lastname>>; << + YearOfBirth>>。你看，用文本替换占位符几乎是不可能的，因为占位符可能被拆分成多个 tex-runs。

为避免这种情况，template.dotx 需要由知道自己在做什么的用户创建。

起初turn spell check off。语法检查也是如此。如果没有，所有发现的可能的拼写错误或语法违规都在单独的 text-runs 中以相应地标记它们。

其次确保整个占位符的格式相同。不同格式的文本也必须分开 text-runs.

我真的很怀疑这能否正常工作。不过你自己试试吧。

完整示例：

import java.io.File;
import java.io.FileOutputStream;
import java.io.FileInputStream;

import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;

import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;

import java.util.List;
import java.util.ArrayList;

public class WordEditAllIBodys {

 private List<XmlObject> getAllCTRs(IBody iBody) {
  XmlCursor cursor = null;
  List<XmlObject> ctrInIBody = new ArrayList<XmlObject>();

  if (iBody instanceof XWPFHeaderFooter) {
   XWPFHeaderFooter headerFooter = (XWPFHeaderFooter)iBody;
   CTHdrFtr ctHdrFtr = headerFooter._getHdrFtr();
   cursor = ctHdrFtr.newCursor();
  } else if (iBody instanceof XWPFDocument) {
   XWPFDocument document = (XWPFDocument)iBody;
   CTDocument1 ctDocument1 = document.getDocument();
   cursor = ctDocument1.newCursor();
  } else if (iBody instanceof XWPFAbstractFootnoteEndnote) {
   XWPFAbstractFootnoteEndnote footEndnote = (XWPFAbstractFootnoteEndnote)iBody;
   CTFtnEdn ctFtnEdn = footEndnote.getCTFtnEdn();
   cursor = ctFtnEdn.newCursor();
  }

  if (cursor != null) {
   cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
   while (cursor.hasNextSelection()) {
    cursor.toNextSelection();
    XmlObject obj = cursor.getObject();
    ctrInIBody.add(obj);
   }
  }
  return ctrInIBody ;
 }

 private void printAllTextInTextRunsOfIBody(IBody iBody) throws Exception {
  List<XmlObject> ctrInIBody = getAllCTRs(iBody);
  for (XmlObject obj : ctrInIBody) {
   CTR ctr = CTR.Factory.parse(obj.xmlText());
   for (CTText ctText : ctr.getTList()) {
    String text = ctText.getStringValue();
    System.out.println(text);
   }
  }
 }

 private void replaceTextInTextRunsOfIBody(IBody iBody, String placeHolder, String textValue) throws Exception {
  List<XmlObject> ctrInIBody = getAllCTRs(iBody);
  for (XmlObject obj : ctrInIBody) {
   CTR ctr = CTR.Factory.parse(obj.xmlText());
   for (CTText ctText : ctr.getTList()) {
    String text = ctText.getStringValue();
    if (text != null && text.contains(placeHolder)) {
     text = text.replace(placeHolder, textValue);
     ctText.setStringValue(text);
     obj.set(ctr);
    }
   }
  }
 }

 public void generateDocX() throws Exception {

  FileOutputStream outputStream = new FileOutputStream(new File("./" + 1234 + "header.docx"));

  //Get the template for generating the new docx file
  File template = new File("./template.dotx");
  XWPFDocument document = new XWPFDocument(new FileInputStream(template));

  //traverse all headers
  for (XWPFHeader header : document.getHeaderList()) {
   printAllTextInTextRunsOfIBody(header);

   replaceTextInTextRunsOfIBody(header, "<<Firstname>>", "Axel");
   replaceTextInTextRunsOfIBody(header, "<<Lastname>>", "Richter");
   replaceTextInTextRunsOfIBody(header, "<<ProfessionalTitle>>", "Skeptic");
  }  

  //traverse all footers
  for (XWPFFooter footer : document.getFooterList()) {
   printAllTextInTextRunsOfIBody(footer);

   replaceTextInTextRunsOfIBody(footer, "<<Firstname>>", "Axel");
   replaceTextInTextRunsOfIBody(footer, "<<Lastname>>", "Richter");
   replaceTextInTextRunsOfIBody(footer, "<<ProfessionalTitle>>", "Skeptic");
  }  

  //traverse document body; note: tables needs not be traversed separately because they are in document body
  printAllTextInTextRunsOfIBody(document);

  replaceTextInTextRunsOfIBody(document, "<<Firstname>>", "Axel");
  replaceTextInTextRunsOfIBody(document, "<<Lastname>>", "Richter");
  replaceTextInTextRunsOfIBody(document, "<<ProfessionalTitle>>", "Skeptic");

  //traverse all footnotes
  for (XWPFFootnote footnote : document.getFootnotes()) {
   printAllTextInTextRunsOfIBody(footnote);

   replaceTextInTextRunsOfIBody(footnote, "<<Firstname>>", "Axel");
   replaceTextInTextRunsOfIBody(footnote, "<<Lastname>>", "Richter");
   replaceTextInTextRunsOfIBody(footnote, "<<ProfessionalTitle>>", "Skeptic");
  }  

  //traverse all endnotes
  for (XWPFEndnote endnote : document.getEndnotes()) {
   printAllTextInTextRunsOfIBody(endnote);

   replaceTextInTextRunsOfIBody(endnote, "<<Firstname>>", "Axel");
   replaceTextInTextRunsOfIBody(endnote, "<<Lastname>>", "Richter");
   replaceTextInTextRunsOfIBody(endnote, "<<ProfessionalTitle>>", "Skeptic");
  }  


  //since document was opened from *.dotx the content type needs to be changed
  document.getPackage().replaceContentType(
   "application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml",
   "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml");

  //Write the changes to the new docx file and close the document
  document.write(outputStream);
  outputStream.close();
  document.close();
 }

 public static void main(String[] args) throws Exception {
  WordEditAllIBodys app = new WordEditAllIBodys();
  app.generateDocX();
 }
}

顺便说一句：由于您的文档是从 *.dotx 打开的，因此内容类型需要从 wordprocessingml.template 更改为 wordprocessingml.document。否则 Word 将不会打开生成的 *.docx 文档。参见 Converting a file with ".dotx" extension (template) to "docx" (Word File)。

由于我对 replacing-placeholder-text-approach 持怀疑态度，所以我更喜欢填写表格。参见 Problem with processing word document java。当然，这样的表单字段不能用在 header 或页脚中。所以 header s 或页脚应该从头开始创建。

在 dotx/docx 文件中编辑 header

Edit header in dotx/docx file

java

apache-poi

apache-poi-4