如何正确检测 LetsGoDigital 字体文本？

Question

我在 Windows 10，我尝试从此图像中提取数字

使用 pytesseract 语言 lets 的库（参见 https://github.com/adrianlazaro8/Tesseract_sevenSegmentsLetsGoDigital or LetsGoDigital, cf. https://github.com/arturaugusto/display_ocr）。

我对我的图像进行了预处理（灰度、阈值和侵蚀）以获得：

但是

的输出

pytesseract.image_to_string(img, lang='lets')

为空。

Answer 1

您没有设置任何特定的 page segmentation method。我会在这里选择 --psm 6：

Assume a single uniform block of text.

所以，即使没有进一步的预处理，我也得到了正确的结果：

import cv2
import pytesseract

img = cv2.imread('RcVbM.jpg')

text = pytesseract.image_to_string(img, lang='lets', config='--psm 6')
print(text.replace('\n', '').replace('\f', ''))
# 004200

----------------------------------------
System information
----------------------------------------
Platform:      Windows-10-10.0.19041-SP0
Python:        3.9.1
PyCharm:       2021.1.1
OpenCV:        4.5.2
pytesseract:   5.0.0-alpha.20201127
----------------------------------------

如何正确检测 LetsGoDigital 字体文本？

How can I properly detect LetsGoDigital font text?

python

python-tesseract