在 dotx/docx 文件中编辑 header
Edit header in dotx/docx file
我目前正在尝试从现有的 dotx 格式模板生成一个新的 docx 文件。我想更改 header 中的名字、姓氏等,但由于某种原因我无法访问它们...
我的方法如下:
public void generateDocX(Long id) throws IOException, InvalidFormatException {
//Get user per id
EmployeeDTO employeeDTO = employeeService.getEmployee(id);
//Location where the new docx file will be saved
FileOutputStream outputStream = new FileOutputStream(new File("/home/user/Documents/project/src/main/files/" + employeeDTO.getId() + "header.docx"));
//Get the template for generating the new docx file
File template = new File("/home/user/Documents/project/src/main/files/template.dotx");
OPCPackage pkg = OPCPackage.open(template);
XWPFDocument document = new XWPFDocument(pkg);
for (XWPFHeader header : document.getHeaderList()) {
List<XWPFParagraph> paragraphs = header.getParagraphs();
System.out.println("Total paragraphs in header are: " + paragraphs.size());
System.out.println("Total elements in the header are: " + header.getBodyElements().size());
for (XWPFParagraph paragraph : paragraphs) {
System.out.println("Paragraph text is: " + paragraph.getText());
List<XWPFRun> runs = paragraph.getRuns();
for (XWPFRun run : runs) {
String runText = run.getText(run.getTextPosition());
System.out.println("Run text is: " + runText);
}
}
}
//Write the changes to the new docx file and close the document
document.write(outputStream);
document.close();
}
控制台中的输出是 1、null 或空字符串...我尝试了 , here and here 中的几种方法,但没有任何运气...
这是 template.dotx
里面的内容
IBody.getParagraphs and IBody.getBodyElements- 只获取直接在 IBody
中的段落或 body 元素。但是您的段落不直接在其中,而是在单独的文本框或文本框中。这就是为什么他们不能通过这种方式获得。
由于 *.docx
是一个 ZIP
存档,包含文档、header 和页脚的 XML
文件,因此可以获得一个 [=14= 的所有文本运行] 通过创建 XmlCursor
选择所有 w:r
XML
元素。对于 XWPFHeader
这可能看起来像这样:
private List<XmlObject> getAllCTRs(XWPFHeader header) {
CTHdrFtr ctHdrFtr = header._getHdrFtr();
XmlCursor cursor = ctHdrFtr.newCursor();
cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
List<XmlObject> ctrInHdrFtr = new ArrayList<XmlObject>();
while (cursor.hasNextSelection()) {
cursor.toNextSelection();
XmlObject obj = cursor.getObject();
ctrInHdrFtr.add(obj);
}
return ctrInHdrFtr;
}
现在我们有一个包含 header 中所有 XML
个元素的列表,这些元素是 Word
中的 text-run-elements 个。
我们可以有一个更通用的 getAllCTRs
,它从任何类型的 IBody
中获取所有 CTR
元素,如下所示:
private List<XmlObject> getAllCTRs(IBody iBody) {
XmlCursor cursor = null;
List<XmlObject> ctrInIBody = new ArrayList<XmlObject>();
if (iBody instanceof XWPFHeaderFooter) {
XWPFHeaderFooter headerFooter = (XWPFHeaderFooter)iBody;
CTHdrFtr ctHdrFtr = headerFooter._getHdrFtr();
cursor = ctHdrFtr.newCursor();
} else if (iBody instanceof XWPFDocument) {
XWPFDocument document = (XWPFDocument)iBody;
CTDocument1 ctDocument1 = document.getDocument();
cursor = ctDocument1.newCursor();
} else if (iBody instanceof XWPFAbstractFootnoteEndnote) {
XWPFAbstractFootnoteEndnote footEndnote = (XWPFAbstractFootnoteEndnote)iBody;
CTFtnEdn ctFtnEdn = footEndnote.getCTFtnEdn();
cursor = ctFtnEdn.newCursor();
}
if (cursor != null) {
cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
while (cursor.hasNextSelection()) {
cursor.toNextSelection();
XmlObject obj = cursor.getObject();
ctrInIBody.add(obj);
}
}
return ctrInIBody ;
}
现在我们有一个包含 IBody
中所有 XML
元素的列表,这些元素是 Word
中的 text-run-elements。
有了这些我们就可以像这样从中获取文本:
private void printAllTextInTextRunsOfIBody(IBody iBody) throws Exception {
List<XmlObject> ctrInIBody = getAllCTRs(iBody);
for (XmlObject obj : ctrInIBody) {
CTR ctr = CTR.Factory.parse(obj.xmlText());
for (CTText ctText : ctr.getTList()) {
String text = ctText.getStringValue();
System.out.println(text);
}
}
}
这可能显示了下一个挑战。因为Word
在创建text-run-elements的时候很乱。例如,您的占位符 <<Firstname>>
可以拆分为 text-runs <<
+ Firstname
+ >>
。原因可能是不同的格式或拼写检查或其他原因。即使这样也是可能的:<<
+ Lastname
+ >>; <<
+ YearOfBirth
+ >>
。甚至这样:<<Firstname
+ >> <<
+ Lastname>>; <<
+ YearOfBirth>>
。你看,用文本替换占位符几乎是不可能的,因为占位符可能被拆分成多个 tex-runs。
为避免这种情况,template.dotx
需要由知道自己在做什么的用户创建。
起初turn spell check off。语法检查也是如此。如果没有,所有发现的可能的拼写错误或语法违规都在单独的 text-runs 中以相应地标记它们。
其次确保整个占位符的格式相同。不同格式的文本也必须分开 text-runs.
我真的很怀疑这能否正常工作。不过你自己试试吧。
完整示例:
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileInputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;
import java.util.List;
import java.util.ArrayList;
public class WordEditAllIBodys {
private List<XmlObject> getAllCTRs(IBody iBody) {
XmlCursor cursor = null;
List<XmlObject> ctrInIBody = new ArrayList<XmlObject>();
if (iBody instanceof XWPFHeaderFooter) {
XWPFHeaderFooter headerFooter = (XWPFHeaderFooter)iBody;
CTHdrFtr ctHdrFtr = headerFooter._getHdrFtr();
cursor = ctHdrFtr.newCursor();
} else if (iBody instanceof XWPFDocument) {
XWPFDocument document = (XWPFDocument)iBody;
CTDocument1 ctDocument1 = document.getDocument();
cursor = ctDocument1.newCursor();
} else if (iBody instanceof XWPFAbstractFootnoteEndnote) {
XWPFAbstractFootnoteEndnote footEndnote = (XWPFAbstractFootnoteEndnote)iBody;
CTFtnEdn ctFtnEdn = footEndnote.getCTFtnEdn();
cursor = ctFtnEdn.newCursor();
}
if (cursor != null) {
cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
while (cursor.hasNextSelection()) {
cursor.toNextSelection();
XmlObject obj = cursor.getObject();
ctrInIBody.add(obj);
}
}
return ctrInIBody ;
}
private void printAllTextInTextRunsOfIBody(IBody iBody) throws Exception {
List<XmlObject> ctrInIBody = getAllCTRs(iBody);
for (XmlObject obj : ctrInIBody) {
CTR ctr = CTR.Factory.parse(obj.xmlText());
for (CTText ctText : ctr.getTList()) {
String text = ctText.getStringValue();
System.out.println(text);
}
}
}
private void replaceTextInTextRunsOfIBody(IBody iBody, String placeHolder, String textValue) throws Exception {
List<XmlObject> ctrInIBody = getAllCTRs(iBody);
for (XmlObject obj : ctrInIBody) {
CTR ctr = CTR.Factory.parse(obj.xmlText());
for (CTText ctText : ctr.getTList()) {
String text = ctText.getStringValue();
if (text != null && text.contains(placeHolder)) {
text = text.replace(placeHolder, textValue);
ctText.setStringValue(text);
obj.set(ctr);
}
}
}
}
public void generateDocX() throws Exception {
FileOutputStream outputStream = new FileOutputStream(new File("./" + 1234 + "header.docx"));
//Get the template for generating the new docx file
File template = new File("./template.dotx");
XWPFDocument document = new XWPFDocument(new FileInputStream(template));
//traverse all headers
for (XWPFHeader header : document.getHeaderList()) {
printAllTextInTextRunsOfIBody(header);
replaceTextInTextRunsOfIBody(header, "<<Firstname>>", "Axel");
replaceTextInTextRunsOfIBody(header, "<<Lastname>>", "Richter");
replaceTextInTextRunsOfIBody(header, "<<ProfessionalTitle>>", "Skeptic");
}
//traverse all footers
for (XWPFFooter footer : document.getFooterList()) {
printAllTextInTextRunsOfIBody(footer);
replaceTextInTextRunsOfIBody(footer, "<<Firstname>>", "Axel");
replaceTextInTextRunsOfIBody(footer, "<<Lastname>>", "Richter");
replaceTextInTextRunsOfIBody(footer, "<<ProfessionalTitle>>", "Skeptic");
}
//traverse document body; note: tables needs not be traversed separately because they are in document body
printAllTextInTextRunsOfIBody(document);
replaceTextInTextRunsOfIBody(document, "<<Firstname>>", "Axel");
replaceTextInTextRunsOfIBody(document, "<<Lastname>>", "Richter");
replaceTextInTextRunsOfIBody(document, "<<ProfessionalTitle>>", "Skeptic");
//traverse all footnotes
for (XWPFFootnote footnote : document.getFootnotes()) {
printAllTextInTextRunsOfIBody(footnote);
replaceTextInTextRunsOfIBody(footnote, "<<Firstname>>", "Axel");
replaceTextInTextRunsOfIBody(footnote, "<<Lastname>>", "Richter");
replaceTextInTextRunsOfIBody(footnote, "<<ProfessionalTitle>>", "Skeptic");
}
//traverse all endnotes
for (XWPFEndnote endnote : document.getEndnotes()) {
printAllTextInTextRunsOfIBody(endnote);
replaceTextInTextRunsOfIBody(endnote, "<<Firstname>>", "Axel");
replaceTextInTextRunsOfIBody(endnote, "<<Lastname>>", "Richter");
replaceTextInTextRunsOfIBody(endnote, "<<ProfessionalTitle>>", "Skeptic");
}
//since document was opened from *.dotx the content type needs to be changed
document.getPackage().replaceContentType(
"application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml");
//Write the changes to the new docx file and close the document
document.write(outputStream);
outputStream.close();
document.close();
}
public static void main(String[] args) throws Exception {
WordEditAllIBodys app = new WordEditAllIBodys();
app.generateDocX();
}
}
顺便说一句:由于您的文档是从 *.dotx
打开的,因此内容类型需要从 wordprocessingml.template
更改为 wordprocessingml.document
。否则 Word 将不会打开生成的 *.docx
文档。参见 Converting a file with ".dotx" extension (template) to "docx" (Word File)。
由于我对 replacing-placeholder-text-approach 持怀疑态度,所以我更喜欢填写表格。参见 Problem with processing word document java。当然,这样的表单字段不能用在 header 或页脚中。所以 header s 或页脚应该从头开始创建。
我目前正在尝试从现有的 dotx 格式模板生成一个新的 docx 文件。我想更改 header 中的名字、姓氏等,但由于某种原因我无法访问它们... 我的方法如下:
public void generateDocX(Long id) throws IOException, InvalidFormatException {
//Get user per id
EmployeeDTO employeeDTO = employeeService.getEmployee(id);
//Location where the new docx file will be saved
FileOutputStream outputStream = new FileOutputStream(new File("/home/user/Documents/project/src/main/files/" + employeeDTO.getId() + "header.docx"));
//Get the template for generating the new docx file
File template = new File("/home/user/Documents/project/src/main/files/template.dotx");
OPCPackage pkg = OPCPackage.open(template);
XWPFDocument document = new XWPFDocument(pkg);
for (XWPFHeader header : document.getHeaderList()) {
List<XWPFParagraph> paragraphs = header.getParagraphs();
System.out.println("Total paragraphs in header are: " + paragraphs.size());
System.out.println("Total elements in the header are: " + header.getBodyElements().size());
for (XWPFParagraph paragraph : paragraphs) {
System.out.println("Paragraph text is: " + paragraph.getText());
List<XWPFRun> runs = paragraph.getRuns();
for (XWPFRun run : runs) {
String runText = run.getText(run.getTextPosition());
System.out.println("Run text is: " + runText);
}
}
}
//Write the changes to the new docx file and close the document
document.write(outputStream);
document.close();
}
控制台中的输出是 1、null 或空字符串...我尝试了
这是 template.dotx
里面的内容IBody.getParagraphs and IBody.getBodyElements- 只获取直接在 IBody
中的段落或 body 元素。但是您的段落不直接在其中,而是在单独的文本框或文本框中。这就是为什么他们不能通过这种方式获得。
由于 *.docx
是一个 ZIP
存档,包含文档、header 和页脚的 XML
文件,因此可以获得一个 [=14= 的所有文本运行] 通过创建 XmlCursor
选择所有 w:r
XML
元素。对于 XWPFHeader
这可能看起来像这样:
private List<XmlObject> getAllCTRs(XWPFHeader header) {
CTHdrFtr ctHdrFtr = header._getHdrFtr();
XmlCursor cursor = ctHdrFtr.newCursor();
cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
List<XmlObject> ctrInHdrFtr = new ArrayList<XmlObject>();
while (cursor.hasNextSelection()) {
cursor.toNextSelection();
XmlObject obj = cursor.getObject();
ctrInHdrFtr.add(obj);
}
return ctrInHdrFtr;
}
现在我们有一个包含 header 中所有 XML
个元素的列表,这些元素是 Word
中的 text-run-elements 个。
我们可以有一个更通用的 getAllCTRs
,它从任何类型的 IBody
中获取所有 CTR
元素,如下所示:
private List<XmlObject> getAllCTRs(IBody iBody) {
XmlCursor cursor = null;
List<XmlObject> ctrInIBody = new ArrayList<XmlObject>();
if (iBody instanceof XWPFHeaderFooter) {
XWPFHeaderFooter headerFooter = (XWPFHeaderFooter)iBody;
CTHdrFtr ctHdrFtr = headerFooter._getHdrFtr();
cursor = ctHdrFtr.newCursor();
} else if (iBody instanceof XWPFDocument) {
XWPFDocument document = (XWPFDocument)iBody;
CTDocument1 ctDocument1 = document.getDocument();
cursor = ctDocument1.newCursor();
} else if (iBody instanceof XWPFAbstractFootnoteEndnote) {
XWPFAbstractFootnoteEndnote footEndnote = (XWPFAbstractFootnoteEndnote)iBody;
CTFtnEdn ctFtnEdn = footEndnote.getCTFtnEdn();
cursor = ctFtnEdn.newCursor();
}
if (cursor != null) {
cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
while (cursor.hasNextSelection()) {
cursor.toNextSelection();
XmlObject obj = cursor.getObject();
ctrInIBody.add(obj);
}
}
return ctrInIBody ;
}
现在我们有一个包含 IBody
中所有 XML
元素的列表,这些元素是 Word
中的 text-run-elements。
有了这些我们就可以像这样从中获取文本:
private void printAllTextInTextRunsOfIBody(IBody iBody) throws Exception {
List<XmlObject> ctrInIBody = getAllCTRs(iBody);
for (XmlObject obj : ctrInIBody) {
CTR ctr = CTR.Factory.parse(obj.xmlText());
for (CTText ctText : ctr.getTList()) {
String text = ctText.getStringValue();
System.out.println(text);
}
}
}
这可能显示了下一个挑战。因为Word
在创建text-run-elements的时候很乱。例如,您的占位符 <<Firstname>>
可以拆分为 text-runs <<
+ Firstname
+ >>
。原因可能是不同的格式或拼写检查或其他原因。即使这样也是可能的:<<
+ Lastname
+ >>; <<
+ YearOfBirth
+ >>
。甚至这样:<<Firstname
+ >> <<
+ Lastname>>; <<
+ YearOfBirth>>
。你看,用文本替换占位符几乎是不可能的,因为占位符可能被拆分成多个 tex-runs。
为避免这种情况,template.dotx
需要由知道自己在做什么的用户创建。
起初turn spell check off。语法检查也是如此。如果没有,所有发现的可能的拼写错误或语法违规都在单独的 text-runs 中以相应地标记它们。
其次确保整个占位符的格式相同。不同格式的文本也必须分开 text-runs.
我真的很怀疑这能否正常工作。不过你自己试试吧。
完整示例:
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileInputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;
import java.util.List;
import java.util.ArrayList;
public class WordEditAllIBodys {
private List<XmlObject> getAllCTRs(IBody iBody) {
XmlCursor cursor = null;
List<XmlObject> ctrInIBody = new ArrayList<XmlObject>();
if (iBody instanceof XWPFHeaderFooter) {
XWPFHeaderFooter headerFooter = (XWPFHeaderFooter)iBody;
CTHdrFtr ctHdrFtr = headerFooter._getHdrFtr();
cursor = ctHdrFtr.newCursor();
} else if (iBody instanceof XWPFDocument) {
XWPFDocument document = (XWPFDocument)iBody;
CTDocument1 ctDocument1 = document.getDocument();
cursor = ctDocument1.newCursor();
} else if (iBody instanceof XWPFAbstractFootnoteEndnote) {
XWPFAbstractFootnoteEndnote footEndnote = (XWPFAbstractFootnoteEndnote)iBody;
CTFtnEdn ctFtnEdn = footEndnote.getCTFtnEdn();
cursor = ctFtnEdn.newCursor();
}
if (cursor != null) {
cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
while (cursor.hasNextSelection()) {
cursor.toNextSelection();
XmlObject obj = cursor.getObject();
ctrInIBody.add(obj);
}
}
return ctrInIBody ;
}
private void printAllTextInTextRunsOfIBody(IBody iBody) throws Exception {
List<XmlObject> ctrInIBody = getAllCTRs(iBody);
for (XmlObject obj : ctrInIBody) {
CTR ctr = CTR.Factory.parse(obj.xmlText());
for (CTText ctText : ctr.getTList()) {
String text = ctText.getStringValue();
System.out.println(text);
}
}
}
private void replaceTextInTextRunsOfIBody(IBody iBody, String placeHolder, String textValue) throws Exception {
List<XmlObject> ctrInIBody = getAllCTRs(iBody);
for (XmlObject obj : ctrInIBody) {
CTR ctr = CTR.Factory.parse(obj.xmlText());
for (CTText ctText : ctr.getTList()) {
String text = ctText.getStringValue();
if (text != null && text.contains(placeHolder)) {
text = text.replace(placeHolder, textValue);
ctText.setStringValue(text);
obj.set(ctr);
}
}
}
}
public void generateDocX() throws Exception {
FileOutputStream outputStream = new FileOutputStream(new File("./" + 1234 + "header.docx"));
//Get the template for generating the new docx file
File template = new File("./template.dotx");
XWPFDocument document = new XWPFDocument(new FileInputStream(template));
//traverse all headers
for (XWPFHeader header : document.getHeaderList()) {
printAllTextInTextRunsOfIBody(header);
replaceTextInTextRunsOfIBody(header, "<<Firstname>>", "Axel");
replaceTextInTextRunsOfIBody(header, "<<Lastname>>", "Richter");
replaceTextInTextRunsOfIBody(header, "<<ProfessionalTitle>>", "Skeptic");
}
//traverse all footers
for (XWPFFooter footer : document.getFooterList()) {
printAllTextInTextRunsOfIBody(footer);
replaceTextInTextRunsOfIBody(footer, "<<Firstname>>", "Axel");
replaceTextInTextRunsOfIBody(footer, "<<Lastname>>", "Richter");
replaceTextInTextRunsOfIBody(footer, "<<ProfessionalTitle>>", "Skeptic");
}
//traverse document body; note: tables needs not be traversed separately because they are in document body
printAllTextInTextRunsOfIBody(document);
replaceTextInTextRunsOfIBody(document, "<<Firstname>>", "Axel");
replaceTextInTextRunsOfIBody(document, "<<Lastname>>", "Richter");
replaceTextInTextRunsOfIBody(document, "<<ProfessionalTitle>>", "Skeptic");
//traverse all footnotes
for (XWPFFootnote footnote : document.getFootnotes()) {
printAllTextInTextRunsOfIBody(footnote);
replaceTextInTextRunsOfIBody(footnote, "<<Firstname>>", "Axel");
replaceTextInTextRunsOfIBody(footnote, "<<Lastname>>", "Richter");
replaceTextInTextRunsOfIBody(footnote, "<<ProfessionalTitle>>", "Skeptic");
}
//traverse all endnotes
for (XWPFEndnote endnote : document.getEndnotes()) {
printAllTextInTextRunsOfIBody(endnote);
replaceTextInTextRunsOfIBody(endnote, "<<Firstname>>", "Axel");
replaceTextInTextRunsOfIBody(endnote, "<<Lastname>>", "Richter");
replaceTextInTextRunsOfIBody(endnote, "<<ProfessionalTitle>>", "Skeptic");
}
//since document was opened from *.dotx the content type needs to be changed
document.getPackage().replaceContentType(
"application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml");
//Write the changes to the new docx file and close the document
document.write(outputStream);
outputStream.close();
document.close();
}
public static void main(String[] args) throws Exception {
WordEditAllIBodys app = new WordEditAllIBodys();
app.generateDocX();
}
}
顺便说一句:由于您的文档是从 *.dotx
打开的,因此内容类型需要从 wordprocessingml.template
更改为 wordprocessingml.document
。否则 Word 将不会打开生成的 *.docx
文档。参见 Converting a file with ".dotx" extension (template) to "docx" (Word File)。
由于我对 replacing-placeholder-text-approach 持怀疑态度,所以我更喜欢填写表格。参见 Problem with processing word document java。当然,这样的表单字段不能用在 header 或页脚中。所以 header s 或页脚应该从头开始创建。