从 PDF 创建图像(PNG 或 JPEG)以及图像中文本的 HTML 图像映射?

Creating image (PNG or JPEG) from PDF along with HTML image maps of text in the image?

我正在记录我维护的系统。本文档包含我在 TeX/TikZ 中创建的图表,该图表呈现为 PDF 文件。然后我将 PDF 文件转换为图像文件(PNG 通过 imagemagick),并将其包含在我的 HTML 文档中。效果很好。

现在我想为图像创建一个 image map,以便我可以添加 hyperlinks/mouseovers/etc。这是一张我希望根据我的系统变化定期更新的图像,所以如果可能的话我想自动化这个过程。

当 PDF 文件呈现为 PNG 时,有没有办法使用软件库或工具自动创建 PDF 文件中各种文本内容的图像映射?

这是我创建的 this gist 示例:

在这种情况下,我想通过在 PDF 中定位它们的边界框,将一些不同的文本字符串转换为超链接:

(都是PDF文件中的文字内容,我可以在Acrobat中select其中任何一个的文字Reader,然后复制+粘贴到我的文本编辑器中。)

有办法吗?

嗯。我找到了 Apache PDFBox 库,它包含一个名为 PrintLocations.java 的示例,它打印信息,但我不确定如何解释它,每个字形一个位置。

> java -jar print_text_locations.jar blockdiagram_example.pdf
String[37.864998,13.939003 fs=4.9813 xscale=4.9813 height=2.49065 space=2.4906502 width=5.1197815]+
String[59.185997,13.662003 fs=9.9626 xscale=9.9626 height=6.1668496 space=2.769603 width=6.6450577]A
String[130.229,13.662003 fs=9.9626 xscale=9.9626 height=6.1668496 space=2.769603 width=6.64505]B
String[198.783,13.498001 fs=9.9626 xscale=9.9626 height=6.1668496 space=2.769603 width=7.192993]C
String[86.827,21.278 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=9.699257]H
String[97.449005,21.278 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.552536](
String[102.00201,21.278 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=5.5137405]s
String[107.51601,21.278 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.552536])
String[156.35,21.278 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=9.234192]G
String[165.58301,21.278 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.552536](
String[170.136,21.278 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=5.513733]s
String[175.65,21.278 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.552536])
String[12.797,29.332 fs=9.9626 xscale=9.9626 height=4.9813 space=4.9813004 width=5.7035875]u
String[38.711,27.432999 fs=4.9813 xscale=4.9813 height=3.4022279 space=2.4906502 width=5.39624]?
String[214.641,29.332 fs=9.9626 xscale=9.9626 height=4.9813 space=4.9813004 width=4.884659]y
String[85.109,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.4869003]c
String[88.5959,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]o
String[92.473335,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]n
String[96.35077,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=1.9387131]t
String[98.28948,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=2.3222733]r
String[100.611755,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]o
String[104.48919,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=1.5481873]l
String[106.03738,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=1.5481873]l
String[107.58556,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]e
String[111.463,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=2.3222733]r
String[155.67801,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774261]a
String[159.55544,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.4868927]c
String[163.04233,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=1.9387207]t
String[164.98105,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774261]u
String[168.85847,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774261]a
String[172.7359,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=1.9387207]t
String[174.67462,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774261]o
String[178.55205,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=2.322281]r
String[58.912003,65.483 fs=9.9626 xscale=9.9626 height=6.1668496 space=2.769603 width=7.192993]D
String[87.536,73.099 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=7.577202]F
String[96.740005,73.099 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.552536](
String[101.29201,73.099 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=5.5137405]s
String[106.80601,73.099 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.5525436])
String[88.983,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.4869003]s
String[92.4699,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]e
String[96.347336,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]n
String[100.22477,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.4869003]s
String[103.71167,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]o
String[107.5891,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=2.3222733]r

不过,我确实做了一个小改动,看起来每个文本项都调用了 writeString 方法,我想我可以找到每个字符串的整体边界矩形:

/**
 * Override the default functionality of PDFTextStripper.
 */
@Override
protected void writeString(String string, List<TextPosition> textPositions) throws IOException
{
    System.out.println("text string: "+string);
    for (TextPosition text : textPositions)
    {
        System.out.println( "String[" + text.getXDirAdj() + "," +
                text.getYDirAdj() + " fs=" + text.getFontSize() + " xscale=" +
                text.getXScale() + " height=" + text.getHeightDir() + " space=" +
                text.getWidthOfSpace() + " width=" +
                text.getWidthDirAdj() + "]" + text.getUnicode() );
    }
}

github 要点中 pdf 文件的输出:

> java -jar pdf2imagemap.jar blockdiagram_example.pdf
text string: +
String[37.864998,13.939003 fs=4.9813 xscale=4.9813 height=2.49065 space=2.4906502 width=5.1197815]+
text string: A
String[59.185997,13.662003 fs=9.9626 xscale=9.9626 height=6.1668496 space=2.769603 width=6.6450577]A
text string: B
String[130.229,13.662003 fs=9.9626 xscale=9.9626 height=6.1668496 space=2.769603 width=6.64505]B
text string: C
String[198.783,13.498001 fs=9.9626 xscale=9.9626 height=6.1668496 space=2.769603 width=7.192993]C
text string: H(s)
String[86.827,21.278 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=9.699257]H
String[97.449005,21.278 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.552536](
String[102.00201,21.278 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=5.5137405]s
String[107.51601,21.278 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.552536])
text string: G(s)
String[156.35,21.278 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=9.234192]G
String[165.58301,21.278 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.552536](
String[170.136,21.278 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=5.513733]s
String[175.65,21.278 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.552536])
text string: u
String[12.797,29.332 fs=9.9626 xscale=9.9626 height=4.9813 space=4.9813004 width=5.7035875]u
text string: ?
String[38.711,27.432999 fs=4.9813 xscale=4.9813 height=3.4022279 space=2.4906502 width=5.39624]?
text string: y
String[214.641,29.332 fs=9.9626 xscale=9.9626 height=4.9813 space=4.9813004 width=4.884659]y
text string: controller
String[85.109,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.4869003]c
String[88.5959,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]o
String[92.473335,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]n
String[96.35077,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=1.9387131]t
String[98.28948,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=2.3222733]r
String[100.611755,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]o
String[104.48919,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=1.5481873]l
String[106.03738,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=1.5481873]l
String[107.58556,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]e
String[111.463,41.419 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=2.3222733]r
text string: actuator
String[155.67801,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774261]a
String[159.55544,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.4868927]c
String[163.04233,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=1.9387207]t
String[164.98105,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774261]u
String[168.85847,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774261]a
String[172.7359,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=1.9387207]t
String[174.67462,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774261]o
String[178.55205,41.046 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=2.322281]r
text string: D
String[58.912003,65.483 fs=9.9626 xscale=9.9626 height=6.1668496 space=2.769603 width=7.192993]D
text string: F
String[87.536,73.099 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=7.577202]F
text string: (s)
String[96.740005,73.099 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.552536](
String[101.29201,73.099 fs=11.9552 xscale=11.9552 height=5.9776 space=5.9776006 width=5.5137405]s
String[106.80601,73.099 fs=11.9552 xscale=11.9552 height=5.983578 space=5.9776006 width=4.5525436])
text string: sensor
String[88.983,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.4869003]s
String[92.4699,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]e
String[96.347336,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]n
String[100.22477,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.4869003]s
String[103.71167,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=3.8774338]o
String[107.5891,91.978004 fs=6.9738 xscale=6.9738 height=4.3167825 space=1.9387167 width=2.3222733]r

我能够将以下 Python 解决方案放在一起作为起点。它将 pdf 转换为 png 并输出相应的图像映射标记。

它将输出 dpi 作为可选参数(默认为 200),以便将边界框从默认的 pdf dpi 72 正确缩放到 png 上:

from pdf2image import convert_from_path
from pdfminer.converter import PDFPageAggregator
from pdfminer.layout import LAParams, LTTextBox
from pdfminer.pdfinterp import PDFPageInterpreter
from pdfminer.pdfinterp import PDFResourceManager
from pdfminer.pdfpage import PDFPage

from yattag import Doc, indent

import argparse
import os


def transform_coords(lobj, mb):

    # Transform LTTextBox bounding box to image map area bounding box.
    #
    # The bounding box of each LTTextBox is specified as:
    #
    # x0: the distance from the left of the page to the left edge of the box
    # y0: the distance from the bottom of the page to the lower edge of the box
    # x1: the distance from the left of the page to the right edge of the box
    # y1: the distance from the bottom of the page to the upper edge of the box
    #
    # So the y coordinates start from the bottom of the image. But with image map
    # areas, y coordinates start from the top of the image, so here we subtract
    # the bounding box's y-axis values from the total height.

    return [lobj.x0, mb[3] - lobj.y1, lobj.x1, mb[3] - lobj.y0]


def get_imagemap(d):
    doc, tag, text = Doc().tagtext()
    with tag("map", name="map"):
        for k, v in d.items():
            doc.stag("area", shape="rect", coords=",".join(v), href="", alt=k)
    return indent(doc.getvalue())


def get_bboxes(pdf, dpi):
    fp = open(pdf, "rb")
    rsrcmgr = PDFResourceManager()
    device = PDFPageAggregator(rsrcmgr, laparams=LAParams())
    interpreter = PDFPageInterpreter(rsrcmgr, device)
    page = list(PDFPage.get_pages(fp))[0]

    interpreter.process_page(page)
    layout = device.get_result()

    # PDFminer reports bounding boxes based on a dpi of 72. I could not find a way
    # to change this, so instead I scale each coordinate by multiplying by dpi/72
    scale = dpi / 72.0

    return {
        lobj.get_text().strip(): [
            str(int(x * scale)) for x in transform_coords(lobj, page.mediabox)
        ]
        for lobj in layout
        if isinstance(lobj, LTTextBox)
    }


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("pdf")
    parser.add_argument("--dpi", type=int, default=200)

    args = parser.parse_args()

    page = list(convert_from_path(args.pdf, args.dpi))[0]
    page.save(f"{os.path.splitext(args.pdf)[0]}.png", "PNG")

    print(get_imagemap(get_bboxes(args.pdf, args.dpi)))


if __name__ == "__main__":
    main()

示例结果:

<img src="https://i.stack.imgur.com/aXWMc.png" usemap="#map">
<map name="map">
  <area shape="rect" coords="361,8,380,43" href="#" alt="B" />
  <area shape="rect" coords="434,31,500,64" href="#" alt="G(s)" />
  <area shape="rect" coords="432,93,502,117" href="#" alt="actuator" />
  <area shape="rect" coords="552,8,572,42" href="#" alt="C" />
  <area shape="rect" coords="596,58,609,86" href="#" alt="y" />
  <area shape="rect" coords="105,26,119,40" href="#" alt="+" />
  <area shape="rect" coords="107,54,122,78" href="#" alt="−" />
  <area shape="rect" coords="35,58,51,86" href="#" alt="u" />
  <area shape="rect" coords="164,8,182,43" href="#" alt="A" />
  <area shape="rect" coords="163,152,183,187" href="#" alt="D" />
  <area shape="rect" coords="241,31,311,64" href="#" alt="H(s)" />
  <area shape="rect" coords="236,94,316,118" href="#" alt="controller" />
  <area shape="rect" coords="243,175,309,208" href="#" alt="F (s)" />
  <area shape="rect" coords="247,234,305,258" href="#" alt="sensor" />
</map>