为什么使用 cv2.imread 从 pdf 文件 return 读取 Wand 创建的图像，所有 RGB 都是 255？

Question

我正在尝试识别 pdf 文件中的文本块。例如，一篇学术论文中有不同的部分，我想将标题标识为一个部分，将作者和地址标识为一个部分，将摘要标识为一个部分。

我正在考虑的一个解决方案是使用 cv2。我首先使用 Wand 使用以下代码将 pdf 转换为图像：

from wand.color import Color
from wand.image import Image as Img
with Img(filename='./files/paper.pdf', resolution=300) as img:
    img.background_color = Color("white")
    img.alpha_channel = 'remove'
    img.save(filename='test_file.jpg')

但是，当我尝试使用以下命令打开 cv2 中的 jpg 文件时：

image = cv2.imread('test_file.jpg')

print image

打印输出显示该图像中所有像素的所有值都是 255。

array([[[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       ...,

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]]], dtype=uint8)

然后，当我想使用 cv2.dnn.blobFromImage() 时，它就是不正确。

这是怎么回事？是因为 pdf 没有正确转换成图像吗？但是我试过了

from PIL import Image
text = pytesseract.image_to_string(Image.open('test_file.jpg'))

，它向我返回了所有文本...

Answer 1

看到所有的点了吗？图像的打印只是显示图像的几个像素。假设您有一个白色背景的 pdf 文本文档，可以安全地假设所有边缘像素都是白色的。打印通常会显示图像的角。

显示图像使用

image = cv2.imread('test_file.jpg')
cv2.imshow('Image', image)
cv2.waitKey(0)

这将在 window 中显示图像，并在消失之前等待您按下一个键。

Answer 2

魔杖图像不是 numpy 数组，因此不能简单地在 cv2 中打开。在 Wand 5.3 中，将有一种方法可以将 Wand 图像导入和导出 numpy 数组。

在 Wand 5.2 中，您可以使用 import_pixels 将 numpy 数组转换为 Wand 图像。在 Wand 5.2 中，您可以将 Wand 图像导出到您应该能够在 cv2 中使用的 numpy 数组。

import numpy as np
from wand.image import Image

with Image(filename='rose.png') as img: 
    matrix = np.array(img)

matrix 将是一个 numpy 数组，您应该可以在 OpenCV 中使用它

为什么使用 cv2.imread 从 pdf 文件 return 读取 Wand 创建的图像，所有 RGB 都是 255？

Why using cv2.imread to read an image created by Wand from pdf file return all 255 for all RGB?

python

opencv

wand