为什么从 Java 写入时,MS Word 中的圆括号(方括号)倒转?

why are parentheses (brackets) inverted in MS Word when written to from Java?

我正在使用来自 JavaFX UI 的 Apache POI 写入 MS Word 文档 (.docx)。字符串是阿拉伯语,当它包含一对括号时,输出是好的,但是当有 2 对或一个引号时,输出是混乱的,即使它在 Eclipse 的控制台中看起来也不错。这是我的代码:

try (var WordOutput = new FileOutputStream(whichFile, true);
     var MSDoc = new XWPFDocument(OPCPackage.open(whichFile));
) { //inside try block now
    List<XWPFTable> tables = MSDoc.getTables();
    tables.toArray();
    XWPFTable ArabicTable = tables.get(0);

    var ArabicRow = ArabicTable.getRow(0);
    ArabicRow.getCell(1).removeParagraph(0);

    //adding a paragraph with a right alignment:
    XWPFParagraph arabicParagraph = ArabicRow.getCell(1).addParagraph();
    arabicParagraph.setAlignment(ParagraphAlignment.RIGHT);
    XWPFRun run = arabicParagraph.createRun();

    //PS: this pane (on Whosebug) reverts the text but the brackets face the right 
    //direction. 
    String theString = "(هذا نص عربي) (هذا نص عربي آخر)"
    run.setText(theString);

    ArabicRow.getCell(1).addParagraph(arabicParagraph);

    MSDoc.write(WordOutput);

} catch(Exception e) {
    //my exception handler here
}

我尝试过的解决方案包括:

  1. 正在使用 UTF-8 编码重新创建字符串,但没有成功:
theString = new String(theString.getBytes(), "UTF-8");
  1. 将 FileOutputStream() 更改为指定编码的文件也不起作用。
  2. 使用 StringBuilder 手动反转括号无效。 提前谢谢你。

括号不是RTL 文本,因为您的阿拉伯语文本是RTL 文本。因此,如果它们未标记为 LTR 文本,则会导致问题。参见 https://en.wikipedia.org/wiki/Bidirectional_text#Table_of_possible_BiDi_character_types

因此,要么使用 U+200E LEFT-TO-RIGHT MARK 标记每个 LTR 字符,然后使用 U+200F RIGHT-TO-LEFT MARK (RLM).

标记 RTL 字符

或者您在包含 LTR 字符(())和 RTL 字符的文本行之前使用 U+202E RIGHT-TO-LEFT OVERRIDE (RLO),在该文本行之后使用 U+202C POP DIRECTIONAL FORMATTING (PDF)。这告诉文字处理软件 RTL 的确切开始和结束位置。这为我带来了正确的输出。

完整示例:

import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;


public class WordTableRTLText {
     
 public static void main(String[] args) throws Exception {

  XWPFDocument doc = new XWPFDocument(new FileInputStream("./source.docx"));
  
  XWPFTable table = doc.getTables().get(0);

  XWPFTableRow row = table.getRow(0);
  
  row.getCell(1).removeParagraph(0);

  //adding a paragraph with Bidi set to force RTL text:
  XWPFParagraph arabicParagraph = row.getCell(1).addParagraph();
  CTP ctp = arabicParagraph.getCTP();
  CTPPr ctppr;
  if ((ctppr = ctp.getPPr()) == null) ctppr = ctp.addNewPPr();
  //ctppr.addNewBidi().setVal(STOnOff.ON); // up to apache poi 4.1.2
  ctppr.addNewBidi().setVal(true); // from apache poi 5.0.0 on
  XWPFRun run = arabicParagraph.createRun();
  
  String  theString = "(هذا نص عربي) (هذا نص عربي آخر)";
  run.setText(theString); // will fail showing parentheses correctly in Word

  run.addBreak();

  //use U+200E LEFT-TO-RIGHT MARK (LRM) and U+200F RIGHT-TO-LEFT MARK (RLM)
  theString = "\u200E(\u200Fهذا نص عربي آخر\u200E)\u200F \u200E(\u200Fهذا نص عربي\u200E)\u200F";
  run.setText(theString);
  
  run.addBreak();
  
  theString = "(هذا نص عربي) (هذا نص عربي آخر)";
  //use U+202E RIGHT-TO-LEFT OVERRIDE (RLO) before the text line having LTR charcters and RTL characters mixed and U+202C POP DIRECTIONAL FORMATTING (PDF) after that text line
  run.setText("\u202E" + theString + "\u202C");


  FileOutputStream out = new FileOutputStream("./result.docx");
  doc.write(out);
  out.close();
  doc.close();

 }
}