Pytesseract 没有检测到我的号码

Question

我正在制作一个简单的程序来检测带有 python 和 pytesseract 的图像中的数字，但情况是它总是 returns 我 ♀，我正在分析这样的图像：

my image

我读取数字的代码如下：

import pytesseract
from pytesseract import (
    Output,
    TesseractError,
    TesseractNotFoundError,
    TSVNotSupported,
    get_tesseract_version,
    image_to_boxes,
    image_to_data,
    image_to_osd,
    image_to_pdf_or_hocr,
    image_to_string,
    run_and_get_output
)

def analizar_resultado(path): 
    image = cv2.imread(path, 1)
    
    text = pytesseract.image_to_string(image, config = 'digits')
    print('texto detectado:', text)

但我无法让它为我工作，我尝试了更多这种类型的质量更好的图像和其他图像，但我无法取回任何数字，我该如何解决这个问题？非常感谢

Answer 1

我有一个three-step解决方案

1. 分别获取每个数字
1. 应用阈值
1. 读取输出

第 1 部分：分别获取每个数字

您可以使用索引变量获取每个数字。例如：

s_idx = 0  # start index
e_idx = int(w/5) - 10  # end index

首先获取图像的高度和宽度，然后对于每个数字，增加索引

for _ in range(0, 6):
    gry_crp = gry[0:h, s_idx:e_idx]
    s_idx = e_idx
    e_idx = s_idx + int(w/5) - 20

结果
- 0 0 9 9 7 6
第 2 部分：应用阈值
- 0 0 9 9 7 6
第 3 部分：阅读
- ```
0.9976
```

遗憾的是，由于伪像，second-zero 无法被识别为数字。

如果你看不懂图片，试试不同的 psm

代码：

import cv2
from pytesseract import image_to_string

img = cv2.imread("A3QRw.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
s_idx = 0  # start index
e_idx = int(w/5) - 10  # end index

result = []

for i, _ in enumerate(range(0, 6)):
    gry_crp = gry[0:h, s_idx:e_idx]
    (h_crp, w_crp) = gry_crp.shape[:2]
    gry_crp = cv2.resize(gry_crp, (w_crp*3, h_crp*3))
    thr = cv2.threshold(gry_crp, 0, 255,
                        cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
    txt = image_to_string(thr, config="--psm 6 digits")
    result.append(txt[0])
    s_idx = e_idx
    e_idx = s_idx + int(w/5) - 20
    cv2.imshow("thr", thr)
    cv2.waitKey(0)

print("".join([digit for digit in result]))

Pytesseract 没有检测到我的号码

Pytesseract does not detect me numbers

python

text-processing

artificial-intelligence

text-recognition

python-tesseract