过滤图像并转换为文本会在 python 中给出错误的输出

Question

我想从图像中提取特定文本，并且我已经对图像进行了一些过滤，但我仍然没有得到确切的信息 text.Also 有什么方法可以单独从图像中获取特定文本图片？

过滤图像并转换为文本的代码

import cv2
import pytesseract

image = cv2.imread('original.png', 0)
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
img = cv2.adaptiveThreshold(thresh, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,   cv2.THRESH_BINARY, 11, 2)
cv2.imwrite('filtered.png', img)
data = pytesseract.image_to_data(img)
print(data)

cv2.imshow('thresh', img)
cv2.waitKey()

Answer 1

您可以尝试 easyOCR 而不是 pytesseract

首先由 pip install easyocr

安装

import cv2
import easyocr

image = cv2.imread('original.jpg', 0)
reader = easyocr.Reader(['en'])
result = reader.readtext(image)

#You can use regular expression
interested_string = 'Patrol Rewards'

line = [l[1] for l in result if 'Patrol Rewards' in l[1]]
print(line)

您将获得包含感兴趣字符串的列表，例如

['Patrol Rewards: Courage Horn X 1']

这将给出正确的输出，但与 CPU 上的 pytesseract 相比它有点慢，但如果你配置了 GPU，那么它会更快。但它提供了相当不错的 OCR 结果。

过滤图像并转换为文本会在 python 中给出错误的输出

Filtering the image and converting to text gives wrong output in python

python

tesseract

cv2