如何正确识别此类图像中的数字？

Question

我正在尝试制作一个可以识别图片中数字的脚本，更准确地说是与这张图片非常相似的图片：

从 50 到 1，但我在使用 pytesseract 读取其中存在的数字时遇到了一些问题。这是我用来阅读它的代码：

im = Image.open(filename)
text = image_to_string(im)

我得到的所有结果都是这样的：

我可以做些什么来提高读数？

Answer 1

Improving the quality of the output is your "holy scripture" when working with Tesseract. Before binarization, you could first try to grayscale 你的图片：

from PIL import Image
import pytesseract

im = Image.open('G9hvi.png').convert('L')
text = pytesseract.image_to_string(im)
print(text.replace('\f', ''))
# 50

轰！ – 无需任何进一步的预处理，您就已经得到了正确的结果。

----------------------------------------
System information
----------------------------------------
Platform:      Windows-10-10.0.19041-SP0
Python:        3.9.1
PyCharm:       2021.1.2
Pillow:        8.2.0
pytesseract:   5.0.0-alpha.20201127
----------------------------------------

如何正确识别此类图像中的数字？

How to properly recognize the number in this kind of images?

python

ocr

tesseract

python-3.x

python-tesseract