为什么从 Java 写入时,MS Word 中的圆括号(方括号)倒转?
why are parentheses (brackets) inverted in MS Word when written to from Java?
我正在使用来自 JavaFX UI 的 Apache POI 写入 MS Word 文档 (.docx)。字符串是阿拉伯语,当它包含一对括号时,输出是好的,但是当有 2 对或一个引号时,输出是混乱的,即使它在 Eclipse 的控制台中看起来也不错。这是我的代码:
try (var WordOutput = new FileOutputStream(whichFile, true);
var MSDoc = new XWPFDocument(OPCPackage.open(whichFile));
) { //inside try block now
List<XWPFTable> tables = MSDoc.getTables();
tables.toArray();
XWPFTable ArabicTable = tables.get(0);
var ArabicRow = ArabicTable.getRow(0);
ArabicRow.getCell(1).removeParagraph(0);
//adding a paragraph with a right alignment:
XWPFParagraph arabicParagraph = ArabicRow.getCell(1).addParagraph();
arabicParagraph.setAlignment(ParagraphAlignment.RIGHT);
XWPFRun run = arabicParagraph.createRun();
//PS: this pane (on Whosebug) reverts the text but the brackets face the right
//direction.
String theString = "(هذا نص عربي) (هذا نص عربي آخر)"
run.setText(theString);
ArabicRow.getCell(1).addParagraph(arabicParagraph);
MSDoc.write(WordOutput);
} catch(Exception e) {
//my exception handler here
}
我尝试过的解决方案包括:
- 正在使用 UTF-8 编码重新创建字符串,但没有成功:
theString = new String(theString.getBytes(), "UTF-8");
- 将 FileOutputStream() 更改为指定编码的文件也不起作用。
- 使用 StringBuilder 手动反转括号无效。
提前谢谢你。
括号不是RTL 文本,因为您的阿拉伯语文本是RTL 文本。因此,如果它们未标记为 LTR 文本,则会导致问题。参见 https://en.wikipedia.org/wiki/Bidirectional_text#Table_of_possible_BiDi_character_types。
因此,要么使用 U+200E LEFT-TO-RIGHT MARK
标记每个 LTR 字符,然后使用 U+200F RIGHT-TO-LEFT MARK (RLM)
.
标记 RTL 字符
或者您在包含 LTR 字符((
和 )
)和 RTL 字符的文本行之前使用 U+202E RIGHT-TO-LEFT OVERRIDE (RLO)
,在该文本行之后使用 U+202C POP DIRECTIONAL FORMATTING (PDF)
。这告诉文字处理软件 RTL 的确切开始和结束位置。这为我带来了正确的输出。
完整示例:
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
public class WordTableRTLText {
public static void main(String[] args) throws Exception {
XWPFDocument doc = new XWPFDocument(new FileInputStream("./source.docx"));
XWPFTable table = doc.getTables().get(0);
XWPFTableRow row = table.getRow(0);
row.getCell(1).removeParagraph(0);
//adding a paragraph with Bidi set to force RTL text:
XWPFParagraph arabicParagraph = row.getCell(1).addParagraph();
CTP ctp = arabicParagraph.getCTP();
CTPPr ctppr;
if ((ctppr = ctp.getPPr()) == null) ctppr = ctp.addNewPPr();
//ctppr.addNewBidi().setVal(STOnOff.ON); // up to apache poi 4.1.2
ctppr.addNewBidi().setVal(true); // from apache poi 5.0.0 on
XWPFRun run = arabicParagraph.createRun();
String theString = "(هذا نص عربي) (هذا نص عربي آخر)";
run.setText(theString); // will fail showing parentheses correctly in Word
run.addBreak();
//use U+200E LEFT-TO-RIGHT MARK (LRM) and U+200F RIGHT-TO-LEFT MARK (RLM)
theString = "\u200E(\u200Fهذا نص عربي آخر\u200E)\u200F \u200E(\u200Fهذا نص عربي\u200E)\u200F";
run.setText(theString);
run.addBreak();
theString = "(هذا نص عربي) (هذا نص عربي آخر)";
//use U+202E RIGHT-TO-LEFT OVERRIDE (RLO) before the text line having LTR charcters and RTL characters mixed and U+202C POP DIRECTIONAL FORMATTING (PDF) after that text line
run.setText("\u202E" + theString + "\u202C");
FileOutputStream out = new FileOutputStream("./result.docx");
doc.write(out);
out.close();
doc.close();
}
}
我正在使用来自 JavaFX UI 的 Apache POI 写入 MS Word 文档 (.docx)。字符串是阿拉伯语,当它包含一对括号时,输出是好的,但是当有 2 对或一个引号时,输出是混乱的,即使它在 Eclipse 的控制台中看起来也不错。这是我的代码:
try (var WordOutput = new FileOutputStream(whichFile, true);
var MSDoc = new XWPFDocument(OPCPackage.open(whichFile));
) { //inside try block now
List<XWPFTable> tables = MSDoc.getTables();
tables.toArray();
XWPFTable ArabicTable = tables.get(0);
var ArabicRow = ArabicTable.getRow(0);
ArabicRow.getCell(1).removeParagraph(0);
//adding a paragraph with a right alignment:
XWPFParagraph arabicParagraph = ArabicRow.getCell(1).addParagraph();
arabicParagraph.setAlignment(ParagraphAlignment.RIGHT);
XWPFRun run = arabicParagraph.createRun();
//PS: this pane (on Whosebug) reverts the text but the brackets face the right
//direction.
String theString = "(هذا نص عربي) (هذا نص عربي آخر)"
run.setText(theString);
ArabicRow.getCell(1).addParagraph(arabicParagraph);
MSDoc.write(WordOutput);
} catch(Exception e) {
//my exception handler here
}
我尝试过的解决方案包括:
- 正在使用 UTF-8 编码重新创建字符串,但没有成功:
theString = new String(theString.getBytes(), "UTF-8");
- 将 FileOutputStream() 更改为指定编码的文件也不起作用。
- 使用 StringBuilder 手动反转括号无效。 提前谢谢你。
括号不是RTL 文本,因为您的阿拉伯语文本是RTL 文本。因此,如果它们未标记为 LTR 文本,则会导致问题。参见 https://en.wikipedia.org/wiki/Bidirectional_text#Table_of_possible_BiDi_character_types。
因此,要么使用 U+200E LEFT-TO-RIGHT MARK
标记每个 LTR 字符,然后使用 U+200F RIGHT-TO-LEFT MARK (RLM)
.
或者您在包含 LTR 字符((
和 )
)和 RTL 字符的文本行之前使用 U+202E RIGHT-TO-LEFT OVERRIDE (RLO)
,在该文本行之后使用 U+202C POP DIRECTIONAL FORMATTING (PDF)
。这告诉文字处理软件 RTL 的确切开始和结束位置。这为我带来了正确的输出。
完整示例:
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
public class WordTableRTLText {
public static void main(String[] args) throws Exception {
XWPFDocument doc = new XWPFDocument(new FileInputStream("./source.docx"));
XWPFTable table = doc.getTables().get(0);
XWPFTableRow row = table.getRow(0);
row.getCell(1).removeParagraph(0);
//adding a paragraph with Bidi set to force RTL text:
XWPFParagraph arabicParagraph = row.getCell(1).addParagraph();
CTP ctp = arabicParagraph.getCTP();
CTPPr ctppr;
if ((ctppr = ctp.getPPr()) == null) ctppr = ctp.addNewPPr();
//ctppr.addNewBidi().setVal(STOnOff.ON); // up to apache poi 4.1.2
ctppr.addNewBidi().setVal(true); // from apache poi 5.0.0 on
XWPFRun run = arabicParagraph.createRun();
String theString = "(هذا نص عربي) (هذا نص عربي آخر)";
run.setText(theString); // will fail showing parentheses correctly in Word
run.addBreak();
//use U+200E LEFT-TO-RIGHT MARK (LRM) and U+200F RIGHT-TO-LEFT MARK (RLM)
theString = "\u200E(\u200Fهذا نص عربي آخر\u200E)\u200F \u200E(\u200Fهذا نص عربي\u200E)\u200F";
run.setText(theString);
run.addBreak();
theString = "(هذا نص عربي) (هذا نص عربي آخر)";
//use U+202E RIGHT-TO-LEFT OVERRIDE (RLO) before the text line having LTR charcters and RTL characters mixed and U+202C POP DIRECTIONAL FORMATTING (PDF) after that text line
run.setText("\u202E" + theString + "\u202C");
FileOutputStream out = new FileOutputStream("./result.docx");
doc.write(out);
out.close();
doc.close();
}
}