Pytesseract 在同一文件中跳过“1”但不跳过“10”

Question

我正在使用 pytesseract 和 openCV 尝试识别 table 个数字。我一直在对图像进行大量工作以调整其大小、重新采样和对其颜色进行阈值处理，以使 pytesseract 更容易阅读。下面是我设法生成的图像。

我的问题是每次连续出现一个“1”时，pytesseract 都无法识别它...

这是我正在尝试读取的图像（一旦我应用了所有提到的处理）：

这是代码的相关部分：

from PIL import Image
import pytesseract

img = cv2.imread('test.jpg', 0)
data = pytesseract.image_to_string(img)

这是输出：

10

499

我也试过 --psm 10 和 --psm 13 但输出只是乱码，如下所示：

=
:x

Answer 1

应用inverse binary threshold:

将分页模式设置为

1
10
499

代码：

import cv2
from pytesseract import image_to_string

image = cv2.imread('uHLww.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV)[1]
text = image_to_string(thresh, config="--psm 6")
print(text)

第二种解决方案：

您甚至不必应用阈值，将 psm 设置为 6 即可得到结果。

import cv2
from pytesseract import image_to_string

print(image_to_string(cv2.imread('uHLww.png'), config="--psm 6"))

Pytesseract 在同一文件中跳过“1”但不跳过“10”

Pytesseract skips "1" but not "10" in the same file

python

tesseract

image-processing

python-tesseract