Apache POI:${my_placeholder} 被视为三个不同的运行
Apache POI: ${my_placeholder} is treated as three different runs
我有一个 .docx 模板,其中包含要填充的占位符,例如 ${programming_language}
、${education}
等
占位符关键字 必须 容易与其他普通词区分开来,因此它们用 ${ }
.
括起来
for (XWPFTable table : doc.getTables()) {
for (XWPFTableRow row : table.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph paragraph : cell.getParagraphs()) {
for (XWPFRun run : paragraph.getRuns()) {
System.out.println("run text: " + run.text());
/** replace text here, etc. */
}
}
}
}
}
我想将占位符与封闭的 ${ }
字符一起提取。问题是,似乎封闭字符被视为不同的运行...
run text: ${
run text: programming_language
run text: }
run text: Some plain text here
run text: ${
run text: education
run text: }
相反,我想实现以下效果:
run text: ${programming_language}
run text: Some plain text here
run text: ${education}
我试过使用其他封闭字符,例如:{ }
、< >
、# #
等
我不想做一些奇怪的 runs
连接等。我想在单个 XWPFRun 中使用它。
如果我找不到合适的解决方案,我会这样做:VAR_PROGRAMMING_LANGUGE
,VAR_EDUCATION
,我想。
当前 apache poi 4.1.2
提供 TextSegment to deal with those Word
text-run issues. XWPFParagraph.searchText 搜索段落中的字符串和 returns 一个 TextSegment
。这提供了对该段落中该文本的开头 运行 和结尾 运行 的访问(BeginRun
和 EndRun
)。它还提供对开始 运行 中的起始字符位置和结束 运行 中的结束字符位置(BeginChar
和 EndChar
)的访问。
它还提供对文本 运行(BeginText
和 EndText
)中文本元素索引的访问。这始终应该是 0
,因为默认文本 运行 只有一个文本元素。
有了这个,我们可以做以下事情:
将begin 运行中找到的部分字符串替换为replacement。为此,获取搜索字符串之前的文本部分并将替换连接到它。之后开始 运行 完全包含替换。
删除开始 运行 和结束 运行 之间的所有文本 运行,因为它们包含不再需要的搜索字符串部分。
最后只保留搜索字符串后的文本部分运行。
这样做我们可以替换多个文本中的文本 运行s.
以下示例显示了这一点。
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
public class WordReplaceTextSegment {
static public void replaceTextSegment(XWPFParagraph paragraph, String textToFind, String replacement) {
TextSegment foundTextSegment = null;
PositionInParagraph startPos = new PositionInParagraph(0, 0, 0);
while((foundTextSegment = paragraph.searchText(textToFind, startPos)) != null) { // search all text segments having text to find
System.out.println(foundTextSegment.getBeginRun()+":"+foundTextSegment.getBeginText()+":"+foundTextSegment.getBeginChar());
System.out.println(foundTextSegment.getEndRun()+":"+foundTextSegment.getEndText()+":"+foundTextSegment.getEndChar());
// maybe there is text before textToFind in begin run
XWPFRun beginRun = paragraph.getRuns().get(foundTextSegment.getBeginRun());
String textInBeginRun = beginRun.getText(foundTextSegment.getBeginText());
String textBefore = textInBeginRun.substring(0, foundTextSegment.getBeginChar()); // we only need the text before
// maybe there is text after textToFind in end run
XWPFRun endRun = paragraph.getRuns().get(foundTextSegment.getEndRun());
String textInEndRun = endRun.getText(foundTextSegment.getEndText());
String textAfter = textInEndRun.substring(foundTextSegment.getEndChar() + 1); // we only need the text after
if (foundTextSegment.getEndRun() == foundTextSegment.getBeginRun()) {
textInBeginRun = textBefore + replacement + textAfter; // if we have only one run, we need the text before, then the replacement, then the text after in that run
} else {
textInBeginRun = textBefore + replacement; // else we need the text before followed by the replacement in begin run
endRun.setText(textAfter, foundTextSegment.getEndText()); // and the text after in end run
}
beginRun.setText(textInBeginRun, foundTextSegment.getBeginText());
// runs between begin run and end run needs to be removed
for (int runBetween = foundTextSegment.getEndRun() - 1; runBetween > foundTextSegment.getBeginRun(); runBetween--) {
paragraph.removeRun(runBetween); // remove not needed runs
}
}
}
public static void main(String[] args) throws Exception {
XWPFDocument doc = new XWPFDocument(new FileInputStream("source.docx"));
String textToFind = "${This is the text to find}"; // might be in different runs
String replacement = "Replacement text";
for (XWPFParagraph paragraph : doc.getParagraphs()) { //go through all paragraphs
if (paragraph.getText().contains(textToFind)) { // paragraph contains text to find
replaceTextSegment(paragraph, textToFind, replacement);
}
}
FileOutputStream out = new FileOutputStream("result.docx");
doc.write(out);
out.close();
doc.close();
}
}
以上代码并非在所有情况下都有效,因为 XWPFParagraph.searchText
存在错误。所以我会提供一个更好的searchText
方法:
/**
* this methods parse the paragraph and search for the string searched.
* If it finds the string, it will return true and the position of the String
* will be saved in the parameter startPos.
*
* @param searched
* @param startPos
*/
static TextSegment searchText(XWPFParagraph paragraph, String searched, PositionInParagraph startPos) {
int startRun = startPos.getRun(),
startText = startPos.getText(),
startChar = startPos.getChar();
int beginRunPos = 0, candCharPos = 0;
boolean newList = false;
//CTR[] rArray = paragraph.getRArray(); //This does not contain all runs. It lacks hyperlink runs for ex.
java.util.List<XWPFRun> runs = paragraph.getRuns();
int beginTextPos = 0, beginCharPos = 0; //must be outside the for loop
//for (int runPos = startRun; runPos < rArray.length; runPos++) {
for (int runPos = startRun; runPos < runs.size(); runPos++) {
//int beginTextPos = 0, beginCharPos = 0, textPos = 0, charPos; //int beginTextPos = 0, beginCharPos = 0 must be outside the for loop
int textPos = 0, charPos;
//CTR ctRun = rArray[runPos];
CTR ctRun = runs.get(runPos).getCTR();
XmlCursor c = ctRun.newCursor();
c.selectPath("./*");
try {
while (c.toNextSelection()) {
XmlObject o = c.getObject();
if (o instanceof CTText) {
if (textPos >= startText) {
String candidate = ((CTText) o).getStringValue();
if (runPos == startRun) {
charPos = startChar;
} else {
charPos = 0;
}
for (; charPos < candidate.length(); charPos++) {
if ((candidate.charAt(charPos) == searched.charAt(0)) && (candCharPos == 0)) {
beginTextPos = textPos;
beginCharPos = charPos;
beginRunPos = runPos;
newList = true;
}
if (candidate.charAt(charPos) == searched.charAt(candCharPos)) {
if (candCharPos + 1 < searched.length()) {
candCharPos++;
} else if (newList) {
TextSegment segment = new TextSegment();
segment.setBeginRun(beginRunPos);
segment.setBeginText(beginTextPos);
segment.setBeginChar(beginCharPos);
segment.setEndRun(runPos);
segment.setEndText(textPos);
segment.setEndChar(charPos);
return segment;
}
} else {
candCharPos = 0;
}
}
}
textPos++;
} else if (o instanceof CTProofErr) {
c.removeXml();
} else if (o instanceof CTRPr) {
//do nothing
} else {
candCharPos = 0;
}
}
} finally {
c.dispose();
}
}
return null;
}
这将被称为:
...
while((foundTextSegment = searchText(paragraph, textToFind, startPos)) != null) {
...
就像有人评论了您的问题一样,您无法控制 Word 在某些 运行 中拆分段落的位置或时间。如果其他答案仍然无法帮助您,那么我有办法解决它:
首先,这个“解决方案”有一个很大的问题,但是我还是把它放在这里,因为有人可以解决它。
public void mainMethod(XWPFParagraph paragraph) {
if (paragraph.getRuns().size() > 1) {
String myRun = unifyRuns(paragraph.getRuns());
// make the verification of placeholders ${...}
paragraph.getRuns().get(0).setText(myRun);
while(paragraph.getRuns().size() > 1) {
paragraph.removeRun(1);
}
}
}
private String unifyRuns(List<XWPFRun> runElements) {
StringBuilder unifiedRun = new StringBuilder();
for (XWPFRun run : runElements) {
unifiedRun.append(run);
}
return unifiedRun.toString();
}
代码可能包含一些错误,因为我是按照我的记忆来做的。
这里的问题是,当Word将段落分成运行s时,它不会白做,因为当有不同字体的文本时(比如font-family或font-size), 它将文本分隔在不同的 运行s.
在文本“这是我的 粗体 文本”中,Word 将拆分文本以将粗体文本和普通文本分开。然后,如果您使用 POI 创建具有不同类型字体的大型文档,则上面的代码是一个糟糕的解决方案。在这种情况下,您需要首先验证 运行 是否实际上是粗体,然后您将处理占位符。
同样,这是我找到的“解决方案”,但尚未完成。抱歉出现英文错误,我正在使用 Google 翻译来写这个答案。
我有一个 .docx 模板,其中包含要填充的占位符,例如 ${programming_language}
、${education}
等
占位符关键字 必须 容易与其他普通词区分开来,因此它们用 ${ }
.
for (XWPFTable table : doc.getTables()) {
for (XWPFTableRow row : table.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph paragraph : cell.getParagraphs()) {
for (XWPFRun run : paragraph.getRuns()) {
System.out.println("run text: " + run.text());
/** replace text here, etc. */
}
}
}
}
}
我想将占位符与封闭的 ${ }
字符一起提取。问题是,似乎封闭字符被视为不同的运行...
run text: ${
run text: programming_language
run text: }
run text: Some plain text here
run text: ${
run text: education
run text: }
相反,我想实现以下效果:
run text: ${programming_language}
run text: Some plain text here
run text: ${education}
我试过使用其他封闭字符,例如:{ }
、< >
、# #
等
我不想做一些奇怪的 runs
连接等。我想在单个 XWPFRun 中使用它。
如果我找不到合适的解决方案,我会这样做:VAR_PROGRAMMING_LANGUGE
,VAR_EDUCATION
,我想。
当前 apache poi 4.1.2
提供 TextSegment to deal with those Word
text-run issues. XWPFParagraph.searchText 搜索段落中的字符串和 returns 一个 TextSegment
。这提供了对该段落中该文本的开头 运行 和结尾 运行 的访问(BeginRun
和 EndRun
)。它还提供对开始 运行 中的起始字符位置和结束 运行 中的结束字符位置(BeginChar
和 EndChar
)的访问。
它还提供对文本 运行(BeginText
和 EndText
)中文本元素索引的访问。这始终应该是 0
,因为默认文本 运行 只有一个文本元素。
有了这个,我们可以做以下事情:
将begin 运行中找到的部分字符串替换为replacement。为此,获取搜索字符串之前的文本部分并将替换连接到它。之后开始 运行 完全包含替换。
删除开始 运行 和结束 运行 之间的所有文本 运行,因为它们包含不再需要的搜索字符串部分。
最后只保留搜索字符串后的文本部分运行。
这样做我们可以替换多个文本中的文本 运行s.
以下示例显示了这一点。
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
public class WordReplaceTextSegment {
static public void replaceTextSegment(XWPFParagraph paragraph, String textToFind, String replacement) {
TextSegment foundTextSegment = null;
PositionInParagraph startPos = new PositionInParagraph(0, 0, 0);
while((foundTextSegment = paragraph.searchText(textToFind, startPos)) != null) { // search all text segments having text to find
System.out.println(foundTextSegment.getBeginRun()+":"+foundTextSegment.getBeginText()+":"+foundTextSegment.getBeginChar());
System.out.println(foundTextSegment.getEndRun()+":"+foundTextSegment.getEndText()+":"+foundTextSegment.getEndChar());
// maybe there is text before textToFind in begin run
XWPFRun beginRun = paragraph.getRuns().get(foundTextSegment.getBeginRun());
String textInBeginRun = beginRun.getText(foundTextSegment.getBeginText());
String textBefore = textInBeginRun.substring(0, foundTextSegment.getBeginChar()); // we only need the text before
// maybe there is text after textToFind in end run
XWPFRun endRun = paragraph.getRuns().get(foundTextSegment.getEndRun());
String textInEndRun = endRun.getText(foundTextSegment.getEndText());
String textAfter = textInEndRun.substring(foundTextSegment.getEndChar() + 1); // we only need the text after
if (foundTextSegment.getEndRun() == foundTextSegment.getBeginRun()) {
textInBeginRun = textBefore + replacement + textAfter; // if we have only one run, we need the text before, then the replacement, then the text after in that run
} else {
textInBeginRun = textBefore + replacement; // else we need the text before followed by the replacement in begin run
endRun.setText(textAfter, foundTextSegment.getEndText()); // and the text after in end run
}
beginRun.setText(textInBeginRun, foundTextSegment.getBeginText());
// runs between begin run and end run needs to be removed
for (int runBetween = foundTextSegment.getEndRun() - 1; runBetween > foundTextSegment.getBeginRun(); runBetween--) {
paragraph.removeRun(runBetween); // remove not needed runs
}
}
}
public static void main(String[] args) throws Exception {
XWPFDocument doc = new XWPFDocument(new FileInputStream("source.docx"));
String textToFind = "${This is the text to find}"; // might be in different runs
String replacement = "Replacement text";
for (XWPFParagraph paragraph : doc.getParagraphs()) { //go through all paragraphs
if (paragraph.getText().contains(textToFind)) { // paragraph contains text to find
replaceTextSegment(paragraph, textToFind, replacement);
}
}
FileOutputStream out = new FileOutputStream("result.docx");
doc.write(out);
out.close();
doc.close();
}
}
以上代码并非在所有情况下都有效,因为 XWPFParagraph.searchText
存在错误。所以我会提供一个更好的searchText
方法:
/**
* this methods parse the paragraph and search for the string searched.
* If it finds the string, it will return true and the position of the String
* will be saved in the parameter startPos.
*
* @param searched
* @param startPos
*/
static TextSegment searchText(XWPFParagraph paragraph, String searched, PositionInParagraph startPos) {
int startRun = startPos.getRun(),
startText = startPos.getText(),
startChar = startPos.getChar();
int beginRunPos = 0, candCharPos = 0;
boolean newList = false;
//CTR[] rArray = paragraph.getRArray(); //This does not contain all runs. It lacks hyperlink runs for ex.
java.util.List<XWPFRun> runs = paragraph.getRuns();
int beginTextPos = 0, beginCharPos = 0; //must be outside the for loop
//for (int runPos = startRun; runPos < rArray.length; runPos++) {
for (int runPos = startRun; runPos < runs.size(); runPos++) {
//int beginTextPos = 0, beginCharPos = 0, textPos = 0, charPos; //int beginTextPos = 0, beginCharPos = 0 must be outside the for loop
int textPos = 0, charPos;
//CTR ctRun = rArray[runPos];
CTR ctRun = runs.get(runPos).getCTR();
XmlCursor c = ctRun.newCursor();
c.selectPath("./*");
try {
while (c.toNextSelection()) {
XmlObject o = c.getObject();
if (o instanceof CTText) {
if (textPos >= startText) {
String candidate = ((CTText) o).getStringValue();
if (runPos == startRun) {
charPos = startChar;
} else {
charPos = 0;
}
for (; charPos < candidate.length(); charPos++) {
if ((candidate.charAt(charPos) == searched.charAt(0)) && (candCharPos == 0)) {
beginTextPos = textPos;
beginCharPos = charPos;
beginRunPos = runPos;
newList = true;
}
if (candidate.charAt(charPos) == searched.charAt(candCharPos)) {
if (candCharPos + 1 < searched.length()) {
candCharPos++;
} else if (newList) {
TextSegment segment = new TextSegment();
segment.setBeginRun(beginRunPos);
segment.setBeginText(beginTextPos);
segment.setBeginChar(beginCharPos);
segment.setEndRun(runPos);
segment.setEndText(textPos);
segment.setEndChar(charPos);
return segment;
}
} else {
candCharPos = 0;
}
}
}
textPos++;
} else if (o instanceof CTProofErr) {
c.removeXml();
} else if (o instanceof CTRPr) {
//do nothing
} else {
candCharPos = 0;
}
}
} finally {
c.dispose();
}
}
return null;
}
这将被称为:
...
while((foundTextSegment = searchText(paragraph, textToFind, startPos)) != null) {
...
就像有人评论了您的问题一样,您无法控制 Word 在某些 运行 中拆分段落的位置或时间。如果其他答案仍然无法帮助您,那么我有办法解决它:
首先,这个“解决方案”有一个很大的问题,但是我还是把它放在这里,因为有人可以解决它。
public void mainMethod(XWPFParagraph paragraph) {
if (paragraph.getRuns().size() > 1) {
String myRun = unifyRuns(paragraph.getRuns());
// make the verification of placeholders ${...}
paragraph.getRuns().get(0).setText(myRun);
while(paragraph.getRuns().size() > 1) {
paragraph.removeRun(1);
}
}
}
private String unifyRuns(List<XWPFRun> runElements) {
StringBuilder unifiedRun = new StringBuilder();
for (XWPFRun run : runElements) {
unifiedRun.append(run);
}
return unifiedRun.toString();
}
代码可能包含一些错误,因为我是按照我的记忆来做的。
这里的问题是,当Word将段落分成运行s时,它不会白做,因为当有不同字体的文本时(比如font-family或font-size), 它将文本分隔在不同的 运行s.
在文本“这是我的 粗体 文本”中,Word 将拆分文本以将粗体文本和普通文本分开。然后,如果您使用 POI 创建具有不同类型字体的大型文档,则上面的代码是一个糟糕的解决方案。在这种情况下,您需要首先验证 运行 是否实际上是粗体,然后您将处理占位符。
同样,这是我找到的“解决方案”,但尚未完成。抱歉出现英文错误,我正在使用 Google 翻译来写这个答案。