为什么 Tesseract 无法检测到该图像中的单个数字？

Question

我有这张图片，我正在尝试用 Tesseract 读取它：

我的代码是这样的：

pytesseract.image_to_string(im)

但是，我得到的只是LOW: 56。因此，Tesseract 无法读取第一行中的 1。我也试图指定一个只有数字的白名单，比如

pytesseract.image_to_string(im, config="tessedit_char_whitelist=0123456789.")

并用侵蚀处理图像，但没有任何效果。有什么建议吗？

Answer 1

Improving the quality of the output is your "holy scripture" when working with Tesseract. Especially, the page segmentation method 应始终明确设置。在这里（大多数时候），我会选择 --psm 6:

Assume a single uniform block of text.

即使不对图像进行进一步预处理，您也已经得到了想要的结果：

import cv2
import pytesseract

image = cv2.imread('gBrcd.png')
text = pytesseract.image_to_string(image, config='--psm 6')
print(text.replace('\f', ''))
# 1
# LOW: 56

----------------------------------------
System information
----------------------------------------
Platform:      Windows-10-10.0.19041-SP0
Python:        3.9.1
PyCharm:       2021.1.1
OpenCV:        4.5.2
pytesseract:   5.0.0-alpha.20201127
----------------------------------------

为什么 Tesseract 无法检测到该图像中的单个数字？

Why is Tesseract unable to detect the single digit in that image?

python

opencv

tesseract

python-tesseract