PDFBox hasGlyph() returns 对于不支持的 unicode 控制字符为真

Question

我正在使用 Apache 的 PDFBox 库编写 PdfDocumentBuilder class。在尝试将字符写入文件之前，我正在使用 currentFont.hasGlyph(character) 检查字符是否具有字形。问题是当字符是像'\u001f'这样的unicode控制字符时，hasGlyph()returns true，导致写入时encode()抛出异常（见PdfDocumentBuilder代码和下面的堆栈跟踪以供参考）。

我做了一些研究，我使用的字体 (Courier Prime) 似乎不支持这些 unicode 控制字符。

那么为什么不支持 unicode 控制字符时 hasGlyph() return 为真？当然，我可以在输入 writeTextWithSymbol() 方法之前用简单的 replaceAll 从行中删除控制字符，但是如果 hasGlyph() 方法没有像我预期的那样工作，我有更大的问题。

PdfDocumentBuilder:

private final PDType0Font baseFont;
private PDType0Font currentFont;   

public PdfDocumentBuilder () {
    baseFont = PDType0Font.load(doc, this.getClass().getResourceAsStream("/CourierPrime.ttf"));
    currentFont = baseFont;
}

private void writeTextWithSymbol (String text) throws IOException {
    StringBuilder nonSymbolBuffer = new StringBuilder();
    for (char character : text.toCharArray()) {
        if (currentFont.hasGlyph(character)) {
            nonSymbolBuffer.append(character);
        } else {
            //handling writing line with symbols...
        }
    }
    if (nonSymbolBuffer.length() > 0) {
        content.showText(nonSymbolBuffer.toString());
    }
}

堆栈跟踪：

java.lang.IllegalArgumentException: No glyph for U+001F in font CourierPrime
at org.apache.pdfbox.pdmodel.font.PDCIDFontType2.encode(PDCIDFontType2.java:400)
at org.apache.pdfbox.pdmodel.font.PDType0Font.encode(PDType0Font.java:351)
at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:316)
at org.apache.pdfbox.pdmodel.PDPageContentStream.showText(PDPageContentStream.java:414)
at org.main.export.PdfDocumentBuilder.writeTextWithSymbol(PdfDocumentBuilder.java:193)

Answer 1

正如上面评论中所解释的，hasGlyph() 并不意味着接受 unicode 字符作为参数。所以如果你需要在写入一个字符之前检查它是否可以被编码，你可以这样做：

private void writeTextWithSymbol (String text) throws IOException {
    StringBuilder nonSymbolBuffer = new StringBuilder();
    for (char character : text.toCharArray()) {
        if (isCharacterEncodeable(character)) {
            nonSymbolBuffer.append(character);
        } else {
            //handle writing line with symbols...
        }
    }
    if (nonSymbolBuffer.length() > 0) {
        content.showText(nonSymbolBuffer.toString());
    }
}

private boolean isCharacterEncodeable (char character) throws IOException {
    try {
        currentFont.encode(Character.toString(character));
        return true;
    } catch (IllegalArgumentException iae) {
        LOGGER.trace("Character cannot be encoded", iae);
        return false;
    }
}

PDFBox hasGlyph() returns 对于不支持的 unicode 控制字符为真

PDFBox hasGlyph() returns true for unsupported unicode control characters

java

unicode

pdfbox