通过使用 PDFBox 在 PDF 中使用文本后缀，将图像放置在文本上。

Question

结果是图像未正确放置在文本上方。我是不是把文字位置弄错了？

这是一个关于如何获取每个 x/y 坐标和大小的示例 PDF 中的字符

public class MyClass extends PDFTextStripper {

    pdocument = PDDocument.load(new File(fileName));

    stripper = new GetCharLocationAndSize();
    stripper.setSortByPosition(true);
    stripper.setStartPage(0);
    stripper.setEndPage(pdocument.getNumberOfPages());
    Writer dummy = new OutputStreamWriter(new 
    ByteArrayOutputStream());
    stripper.writeText(pdocument, dummy);


 /*
 * Override the default functionality of PDFTextStripper.writeString()
 */
@Override
protected void WriteString(String string, List<TextPosition> 
textPositions) throws IOException {

     String imagePath = "image.jpg";
     PDImageXObject pdImage = 
     PDImageXObject.createFromFile(imagePath,pdocument);

     PDPageContentStream contentStream = new 
     PDPageContentStream(pdocument, stripper.getCurrentPage(), true, 
     true);

     for (TextPosition text : textPositions) {

         if (text.getUnicode().equals("a")) {
         contentStream.drawImage(pdImage, text.getXDirAdj(), 
         text.getYDirAdj(), text.getWidthDirAdj(),text.getHeightDir()); 
       }
       }
    contentStream.close();
    pdocument.save("newdoc.pdf");
    }
    }

Answer 1

正在检索合理的坐标

您使用 text.getXDirAdj() 和 text.getYDirAdj() 作为内容流中的 x 和 y 坐标。这是行不通的，因为 PDFBox 在文本提取过程中使用的坐标被转换为他们更喜欢用于文本提取目的的坐标系，请参见。 JavaDocs:

/**
 * This will get the text direction adjusted x position of the character.
 * This is adjusted based on text direction so that the first character
 * in that direction is in the upper left at 0,0.
 *
 * @return The x coordinate of the text.
 */
public float getXDirAdj()

/**
 * This will get the y position of the text, adjusted so that 0,0 is upper left and it is
 * adjusted based on the text direction.
 *
 * @return The adjusted y coordinate of the character.
 */
public float getYDirAdj()

对于 TextPosition text 你应该改用

text.getTextMatrix().getTranslatex()

和

text.getTextMatrix().getTranslateY()

但即使是这些数字也可能需要更正，请参见。，因为 PDFBox 已将矩阵乘以裁剪框左下角为原点的平移。

因此，如果PDRectangle cropBox是当前页面的裁剪框，则使用

text.getTextMatrix().getTranslatex() + cropBox.getLowerLeftX()

和

text.getTextMatrix().getTranslateY() + cropBox.getLowerLeftY()

（PDFBox 的这个坐标 规范化 对于任何真正想要使用文本坐标的人来说都是一个 PITA...）

其他问题

您的代码还有一些其他问题，其中之一随着您共享的文档变得清晰：您在没有重置图形上下文的情况下附加到页面内容流：

PDPageContentStream contentStream = new PDPageContentStream(pdocument,
        stripper.getCurrentPage(), true, true);

具有此签名的构造函数假定您不想重置上下文。使用带有附加 boolean 参数的参数并将其设置为 true 以请求上下文重置：

PDPageContentStream contentStream = new PDPageContentStream(pdocument,
        stripper.getCurrentPage(), true, true, true);

现在上下文重置了，位置又好了。

不过，这两个构造函数都已弃用，因此不应使用。在开发分支中，它们已被删除。而是使用

PDPageContentStream contentStream = new PDPageContentStream(pdocument,
        stripper.getCurrentPage(), AppendMode.APPEND, true, true);

但是，这引入了另一个问题：您为每个 writeString 调用创建了一个新的 PDPageContentStream。如果每次都通过上下文重置来完成，saveGraphicsState/restoreGraphicsState 对的嵌套可能会变得非常深。因此，您应该只为每个页面创建一个这样的内容流，并在该页面的所有 writeString 调用中使用它。

因此，您的文本剥离子 class 可能如下所示：

class CoverCharByImage extends PDFTextStripper {
    public CoverCharByImage(PDImageXObject pdImage) throws IOException {
        super();
        this.pdImage = pdImage;
    }

    final PDImageXObject pdImage;
    PDPageContentStream contentStream = null;

    @Override
    public void processPage(PDPage page) throws IOException {
        super.processPage(page);
        if (contentStream != null) {
            contentStream.close();
            contentStream = null;
        }
    }

    @Override
    protected void writeString(String string, List<TextPosition> textPositions) throws IOException {
        if (contentStream == null)
            contentStream = new PDPageContentStream(document, getCurrentPage(), AppendMode.APPEND, true, true);

        PDRectangle cropBox = getCurrentPage().getCropBox();

        for (TextPosition text : textPositions) {
            if (text.getUnicode().equals("a")) {
                contentStream.drawImage(pdImage, text.getTextMatrix().getTranslateX() + cropBox.getLowerLeftX(),
                        text.getTextMatrix().getTranslateY() + cropBox.getLowerLeftY(),
                        text.getWidthDirAdj(), text.getHeightDir());
            }
        }
    }
}

(CoverCharacterByImage内class)

也可以这样使用：

PDDocument pdocument = PDDocument.load(...);

String imagePath = ...;
PDImageXObject pdImage = PDImageXObject.createFromFile(imagePath, pdocument);

CoverCharByImage stripper = new CoverCharByImage(pdImage);
stripper.setSortByPosition(true);
Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream());
stripper.writeText(pdocument, dummy);
pdocument.save(...);

(CoverCharacterByImage 测试 testCoverLikeLez)

导致

等等

通过使用 PDFBox 在 PDF 中使用文本后缀，将图像放置在文本上。

Placing an image over text, by using the text postiton in a PDF using PDFBox.

pdfbox

正在检索合理的坐标

其他问题