如何从 python 中的验证码图像中提取数字?

how to extract numbers from captcha image in python?

我想从验证码图像中提取数字,所以我从这个答案中尝试了这段代码 this answer

try:
    from PIL import Image
except ImportError:
    import Image
import pytesseract
import cv2

file = 'sample.jpg'

img = cv2.imread(file, cv2.IMREAD_GRAYSCALE)
img = cv2.resize(img, None, fx=10, fy=10, interpolation=cv2.INTER_LINEAR)
img = cv2.medianBlur(img, 9)
th, img = cv2.threshold(img, 185, 255, cv2.THRESH_BINARY)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (4,8))
img = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
cv2.imwrite("sample2.jpg", img)


file = 'sample2.jpg'
text = pytesseract.image_to_string(file)
print(''.join(x for x in text if x.isdigit()))

这张图片效果很好:

outPut: 436359
But, when I tried it on this image:

它什么也没给我,outPut:
如何修改我的代码以从第二张图片中获取数字字符串?

编辑:
我尝试了 ,它对上图效果很好。但它无法识别图像 A 中的数字 (8,1) 和图像 B 中的数字 (7)
图片A

图片B
如何解决?

通常,像这样在图像上恰到好处的 OCR 与转换的顺序和参数有关。例如,在下面的代码片段中,我首先转换为灰度,然后侵蚀像素,然后膨胀,然后再次侵蚀。我使用阈值转换为二进制(只有黑色和白色),然后再膨胀和腐蚀一次。这对我来说产生了正确的值 859917 并且应该是可重现的。

import cv2
import numpy as np
import pytesseract

file = 'sample2.jpg'
img = cv2.imread(file)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ekernel = np.ones((1,2),np.uint8)
eroded = cv2.erode(gray, ekernel, iterations = 1)
dkernel = np.ones((2,3),np.uint8)
dilated_once = cv2.dilate(eroded, dkernel, iterations = 1)
ekernel = np.ones((2,2),np.uint8)
dilated_twice = cv2.erode(dilated_once, ekernel, iterations = 1)
th, threshed = cv2.threshold(dilated_twice, 200, 255, cv2.THRESH_BINARY)
dkernel = np.ones((2,2),np.uint8)
threshed_dilated = cv2.dilate(threshed, dkernel, iterations = 1)
ekernel = np.ones((2,2),np.uint8)
threshed_eroded = cv2.erode(threshed_dilated, ekernel, iterations = 1)
text = pytesseract.image_to_string(threshed_eroded)
print(''.join(x for x in text if x.isdigit()))