希伯来语、阿拉伯语、意第绪语文本在 PDFBox 2.0.5 中以相反的顺序书写

Question

我想在 PDF 中写一些阿拉伯语、希伯来语和意第绪语字符，但它们的书写顺序是相反的。我正在使用 PDFBox 2.0.5 创建和写入 PDF 文档。

我的示例代码

String relativeWebPath = "/font/arial.ttf";
String absoluteDiskPath = getServletContext().getRealPath(relativeWebPath);
File file = new File(absoluteDiskPath);

ByteArrayOutputStream output=new ByteArrayOutputStream();
PDDocument document=new PDDocument();
PDFont font = PDType0Font.load(document, new File(absoluteDiskPath));
PDPage test=new PDPage();
document.addPage(test);
PDPageContentStream content=new PDPageContentStream(document, test);
final String EXAMPLE = "النص العربي";
System.out.print(EXAMPLE);

 content.beginText();
 content.newLineAtOffset(50, 680);
 content.setFont(font, 12);
 content.showText(EXAMPLE);
 System.out.print(EXAMPLE);
 content.endText();

 content.close();

 document.save(output);
 document.close();

在研究解决方案时，我发现它是通过在项目中添加 ICU4j 库来处理的，但它对我不起作用，而且 icu4j 依赖项已从 PDFBox 2.0 中删除 (PDFBox-2118)。

Answer 1

我们需要在最后处理它，因为这不是由 PDFBox 本身处理的。好的，我有一些解决方法，创建 RTL 语言字符的反向字符串并将其传递给 PDFBox，然后 PDFBox 将以正确的方向写入它。

现在下一个问题是如何检测RTL语言的文本以及如何反转它然后解决方案就在这里，这可以通过使用java的BiDi class对象来实现。

完整的解决方案，该解决方案还处理混合字符串：

    String word = EXAMPLE; // string from question
    Bidi bidi = new Bidi(word, -2);
    if (!bidi.isMixed() && bidi.getBaseLevel() == 0) {
        return word;
    }
    else {
        int runCount = bidi.getRunCount();
        byte[] levels = new byte[runCount];
        Integer[] runs = new Integer[runCount];

        for (int result = 0; result < runCount; ++result) {
            levels[result] = (byte) bidi.getRunLevel(result);
            runs[result] = Integer.valueOf(result);
        }

        Bidi.reorderVisually(levels, 0, runs, 0, runCount);
        StringBuilder bidiText = new StringBuilder();

        for (int i = 0; i < runCount; ++i) {
            int index = runs[i].intValue();
            int start = bidi.getRunStart(index);
            int end = bidi.getRunLimit(index);
            byte level = levels[index];
            if ((level & 1) != 0) {
                while (true) {
                    --end;
                    if (end < start) {
                        break;
                    }

                    char character = word.charAt(end);
                    if (Character.isMirrored(word.codePointAt(end))) {
                        bidiText.append(character);
                    }
                    else {
                        bidiText.append(character);
                    }
                }
            }
            else {
                bidiText.append(word, start, end);
            }
        }

        return bidiText.toString();
    }

这解决了我的问题。希望对其他人有帮助。

希伯来语、阿拉伯语、意第绪语文本在 PDFBox 2.0.5 中以相反的顺序书写

Hebrew, Arabic, Yiddish text is written in reverse order in PDFBox 2.0.5

pdfbox