Java-PDFbox:为加标签的 PDF 中的线条和下划线创建工件标签

Java-PDFbox: Creating the artifact tag for lines and underlines in tagged PDF

我正在从标记的 pdf 创建辅助功能 PDF。它显示“路径对象未标记”错误。 PDF 包含线条和带下划线的文本。因此,我正在尝试为未标记的订单项添加“ARTIFACT”标签。我能够从 PDFGraphicsStreamEngine 获取行。谁能帮我解决这个问题?

PDF Page PAC3 Error

您可以使用 PdfContentStreamEditor class from 通过自定义和调用它来根据需要编辑页面内容流:

PDDocument document = ...;
for (PDPage page : document.getDocumentCatalog().getPages()) {
    PdfContentStreamEditor markEditor = new PdfContentStreamEditor(document, page) {
        int markedContentDepth = 0;

        @Override
        public void beginMarkedContentSequence(COSName tag, COSDictionary properties) {
            if (inArtifact) {
                System.err.println("Structural error in content stream: Path not properly closed by path painting instruction.");
            }
            markedContentDepth++;
            super.beginMarkedContentSequence(tag, properties);
        }

        @Override
        public void endMarkedContentSequence() {
            markedContentDepth--;
            super.endMarkedContentSequence();
        }

        boolean inArtifact = false;

        @Override
        protected void write(ContentStreamWriter contentStreamWriter, Operator operator, List<COSBase> operands) throws IOException {
            String operatorString = operator.getName();

            boolean unmarked = markedContentDepth == 0;
            boolean inArtifactBefore = inArtifact;

            if (unmarked && (!inArtifactBefore) && PATH_CONSTRUCTION.contains(operatorString)) {
                super.write(contentStreamWriter, Operator.getOperator("BMC"), Collections.singletonList(COSName.ARTIFACT));
                inArtifact = true;
            }

            super.write(contentStreamWriter, operator, operands);

            if (unmarked && inArtifactBefore && PATH_PAINTING.contains(operatorString)) {
                super.write(contentStreamWriter, Operator.getOperator("EMC"), Collections.emptyList());
                inArtifact = false;
            }
        }

        final List<String> PATH_CONSTRUCTION = Arrays.asList("m", "l", "c", "v", "y", "h", "re");
        final List<String> PATH_PAINTING = Arrays.asList("s", "S", "f", "F", "f*", "B", "B*", "b", "b*", "n");
    };
    markEditor.processPage(page);
}
document.save(...);

(EditMarkedContent 测试 testMarkUnmarkedPathsAsArtifactsTradeSimple1)

beginMarkedContentSequenceendMarkedContentSequence 覆盖跟踪当前标记的内容嵌套深度,特别是当前内容是否被标记。

对于尚未标记的指令,write 覆盖然后将未标记的路径构造和绘制指令序列包含在 /Artifact BMC ... EMC 中。

注意,此代码仅考虑页面内容流中的内容,它不会降级为 XObject、模式等形式。

此外,如果内容流有错误(例如,路径构造没有绘制),此代码可能会添加其他错误(例如,不平衡的标记内容开始和结束)。