更新 DOCX 中文本标记的引用内容
Update content of references to text mark in DOCX
我的需求简而言之:我想使用 Apache POI 5 刷新对 docx 文档中文本标记的引用。
上下文:在 docx 文档中,我的系统替换占位符中的文本(例如“${myplaceholder}”)。其中一些占位符位于文本标记内。这很好用。
文档中有对文本标记的引用。替换占位符(在文本标记内)后,我打开 docx 文档,使用 Ctrl+A select 一切并按 F9。然后更新所有引用并包含来自引用文本标记/占位符的文本。
Problem/Quest: 我不希望(系统用户)按 Ctrl+A / F9 来更新引用。
问题:有没有办法 (a) 强制 Microsoft Word 刷新所有引用(这对于带有 Apache POI 的 xlsx 文件是可行的)或 (b)刷新 Apache POI 5 中的所有引用?
更新+简单代码示例:
这是输入 docx 文档的内容(其中第二个“${firstname}”是对第一个“${firstname}”的引用(在 MS Word 中标记为文本标记)):
这是将一些文本添加到“名字”占位符的代码:
File inputDocxFile = new File("Reference.docx");
File outputDocxFile = new File("Reference_output.docx");
XWPFDocument document = new XWPFDocument(new FileInputStream(inputDocxFile));
for (XWPFParagraph paragraph : document.getParagraphs()) {
System.out.println("Paragraph: " + paragraph.getText());
for (XWPFRun run : paragraph.getRuns()) {
System.out.println("RUN: " + run.text());
if (paragraph.getText().equals("${firstname}") && run.text().equals("firstname")) {
run.setText("World");
}
}
}
FileOutputStream fos = new FileOutputStream(outputDocxFile);
document.write(fos);
fos.close();
document.close();
这是输出(没有刷新参考):
按下 Ctrl+A / F9 后,这是刷新的(预期的)输出:
当 text-replacement 正常工作时,整个问题就消失了。
这里的问题是 Word 如何在不同的文本运行中存储文本。不仅不同的格式会在不同的文本运行中拆分文本,还会标记语法和拼写检查问题以及其他许多事情。因此,人们无法预测在 Word 中键入文本时如何将文本拆分成多个文本行。这就是为什么你的 text-replacement 方法不好。
Apache POI 提供 TextSegment to solve those kind of problems. And using current apache poi 5.2.0
this also seems to work correctly. Former versions had have bugs in XWPFParagraph.searchText
- see Apache POI: ${my_placeholder} is treated as three different runs 解决方法。
使用 TextSegment
可以确定搜索文本的开始和结束,因此 text-replacement 更好。
以下示例应说明这一点。
我的 Reference.docx
看起来是这样的:
head中的${firstname}
、${lastname}
和${address}
被收藏为firstname
。 lastname
和 address
。它们在文本中的出现被引用为 { REF firstname }
、 { REF lastname}
和 { REF address}
在 运行 以下代码之后:
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
public class WordReplaceTextSegment {
static public void replaceTextSegment(XWPFParagraph paragraph, String textToFind, String replacement) {
TextSegment foundTextSegment = null;
PositionInParagraph startPos = new PositionInParagraph(0, 0, 0);
while((foundTextSegment = paragraph.searchText(textToFind, startPos)) != null) { // search all text segments having text to find
//System.out.println(foundTextSegment.getBeginRun()+":"+foundTextSegment.getBeginText()+":"+foundTextSegment.getBeginChar());
//System.out.println(foundTextSegment.getEndRun()+":"+foundTextSegment.getEndText()+":"+foundTextSegment.getEndChar());
// maybe there is text before textToFind in begin run
XWPFRun beginRun = paragraph.getRuns().get(foundTextSegment.getBeginRun());
String textInBeginRun = beginRun.getText(foundTextSegment.getBeginText());
String textBefore = textInBeginRun.substring(0, foundTextSegment.getBeginChar()); // we only need the text before
// maybe there is text after textToFind in end run
XWPFRun endRun = paragraph.getRuns().get(foundTextSegment.getEndRun());
String textInEndRun = endRun.getText(foundTextSegment.getEndText());
String textAfter = textInEndRun.substring(foundTextSegment.getEndChar() + 1); // we only need the text after
if (foundTextSegment.getEndRun() == foundTextSegment.getBeginRun()) {
textInBeginRun = textBefore + replacement + textAfter; // if we have only one run, we need the text before, then the replacement, then the text after in that run
} else {
textInBeginRun = textBefore + replacement; // else we need the text before followed by the replacement in begin run
endRun.setText(textAfter, foundTextSegment.getEndText()); // and the text after in end run
}
beginRun.setText(textInBeginRun, foundTextSegment.getBeginText());
// runs between begin run and end run needs to be removed
for (int runBetween = foundTextSegment.getEndRun() - 1; runBetween > foundTextSegment.getBeginRun(); runBetween--) {
paragraph.removeRun(runBetween); // remove not needed runs
}
}
}
public static void main(String[] args) throws Exception {
XWPFDocument doc = new XWPFDocument(new FileInputStream("./Reference.docx"));
String[] textsToFind = {"${firstname}", "${lastname}", "${address}"}; // might be in different runs
String[] replacements = {"Axel", "Richter", "Somewhere in Germany"};
for (XWPFParagraph paragraph : doc.getParagraphs()) { //go through all paragraphs
for (int i = 0; i < textsToFind.length; i++) {
String textToFind = textsToFind[i];
if (paragraph.getText().contains(textToFind)) { // paragraph contains text to find
String replacement = replacements[i];
replaceTextSegment(paragraph, textToFind, replacement);
}
}
}
FileOutputStream out = new FileOutputStream("./Reference_output.docx");
doc.write(out);
out.close();
doc.close();
}
}
Reference_output.docx
看起来像这样:
所有替换都已完成,书签和对书签的引用仍然存在。
我的需求简而言之:我想使用 Apache POI 5 刷新对 docx 文档中文本标记的引用。
上下文:在 docx 文档中,我的系统替换占位符中的文本(例如“${myplaceholder}”)。其中一些占位符位于文本标记内。这很好用。
文档中有对文本标记的引用。替换占位符(在文本标记内)后,我打开 docx 文档,使用 Ctrl+A select 一切并按 F9。然后更新所有引用并包含来自引用文本标记/占位符的文本。
Problem/Quest: 我不希望(系统用户)按 Ctrl+A / F9 来更新引用。
问题:有没有办法 (a) 强制 Microsoft Word 刷新所有引用(这对于带有 Apache POI 的 xlsx 文件是可行的)或 (b)刷新 Apache POI 5 中的所有引用?
更新+简单代码示例:
这是输入 docx 文档的内容(其中第二个“${firstname}”是对第一个“${firstname}”的引用(在 MS Word 中标记为文本标记)):
这是将一些文本添加到“名字”占位符的代码:
File inputDocxFile = new File("Reference.docx");
File outputDocxFile = new File("Reference_output.docx");
XWPFDocument document = new XWPFDocument(new FileInputStream(inputDocxFile));
for (XWPFParagraph paragraph : document.getParagraphs()) {
System.out.println("Paragraph: " + paragraph.getText());
for (XWPFRun run : paragraph.getRuns()) {
System.out.println("RUN: " + run.text());
if (paragraph.getText().equals("${firstname}") && run.text().equals("firstname")) {
run.setText("World");
}
}
}
FileOutputStream fos = new FileOutputStream(outputDocxFile);
document.write(fos);
fos.close();
document.close();
这是输出(没有刷新参考):
按下 Ctrl+A / F9 后,这是刷新的(预期的)输出:
当 text-replacement 正常工作时,整个问题就消失了。
这里的问题是 Word 如何在不同的文本运行中存储文本。不仅不同的格式会在不同的文本运行中拆分文本,还会标记语法和拼写检查问题以及其他许多事情。因此,人们无法预测在 Word 中键入文本时如何将文本拆分成多个文本行。这就是为什么你的 text-replacement 方法不好。
Apache POI 提供 TextSegment to solve those kind of problems. And using current apache poi 5.2.0
this also seems to work correctly. Former versions had have bugs in XWPFParagraph.searchText
- see Apache POI: ${my_placeholder} is treated as three different runs 解决方法。
使用 TextSegment
可以确定搜索文本的开始和结束,因此 text-replacement 更好。
以下示例应说明这一点。
我的 Reference.docx
看起来是这样的:
head中的${firstname}
、${lastname}
和${address}
被收藏为firstname
。 lastname
和 address
。它们在文本中的出现被引用为 { REF firstname }
、 { REF lastname}
和 { REF address}
在 运行 以下代码之后:
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
public class WordReplaceTextSegment {
static public void replaceTextSegment(XWPFParagraph paragraph, String textToFind, String replacement) {
TextSegment foundTextSegment = null;
PositionInParagraph startPos = new PositionInParagraph(0, 0, 0);
while((foundTextSegment = paragraph.searchText(textToFind, startPos)) != null) { // search all text segments having text to find
//System.out.println(foundTextSegment.getBeginRun()+":"+foundTextSegment.getBeginText()+":"+foundTextSegment.getBeginChar());
//System.out.println(foundTextSegment.getEndRun()+":"+foundTextSegment.getEndText()+":"+foundTextSegment.getEndChar());
// maybe there is text before textToFind in begin run
XWPFRun beginRun = paragraph.getRuns().get(foundTextSegment.getBeginRun());
String textInBeginRun = beginRun.getText(foundTextSegment.getBeginText());
String textBefore = textInBeginRun.substring(0, foundTextSegment.getBeginChar()); // we only need the text before
// maybe there is text after textToFind in end run
XWPFRun endRun = paragraph.getRuns().get(foundTextSegment.getEndRun());
String textInEndRun = endRun.getText(foundTextSegment.getEndText());
String textAfter = textInEndRun.substring(foundTextSegment.getEndChar() + 1); // we only need the text after
if (foundTextSegment.getEndRun() == foundTextSegment.getBeginRun()) {
textInBeginRun = textBefore + replacement + textAfter; // if we have only one run, we need the text before, then the replacement, then the text after in that run
} else {
textInBeginRun = textBefore + replacement; // else we need the text before followed by the replacement in begin run
endRun.setText(textAfter, foundTextSegment.getEndText()); // and the text after in end run
}
beginRun.setText(textInBeginRun, foundTextSegment.getBeginText());
// runs between begin run and end run needs to be removed
for (int runBetween = foundTextSegment.getEndRun() - 1; runBetween > foundTextSegment.getBeginRun(); runBetween--) {
paragraph.removeRun(runBetween); // remove not needed runs
}
}
}
public static void main(String[] args) throws Exception {
XWPFDocument doc = new XWPFDocument(new FileInputStream("./Reference.docx"));
String[] textsToFind = {"${firstname}", "${lastname}", "${address}"}; // might be in different runs
String[] replacements = {"Axel", "Richter", "Somewhere in Germany"};
for (XWPFParagraph paragraph : doc.getParagraphs()) { //go through all paragraphs
for (int i = 0; i < textsToFind.length; i++) {
String textToFind = textsToFind[i];
if (paragraph.getText().contains(textToFind)) { // paragraph contains text to find
String replacement = replacements[i];
replaceTextSegment(paragraph, textToFind, replacement);
}
}
}
FileOutputStream out = new FileOutputStream("./Reference_output.docx");
doc.write(out);
out.close();
doc.close();
}
}
Reference_output.docx
看起来像这样:
所有替换都已完成,书签和对书签的引用仍然存在。