即使输入文本，Tesseract OCR 也会给出非常糟糕的输出

Question

我一直在尝试让 tesseract OCR 从预先裁剪的图像中提取一些数字，但即使图像相当清晰，它也根本无法正常工作。我尝试四处寻找解决方案，但我在这里看到的所有其他问题都涉及裁剪或倾斜文本的问题。

这是我的代码示例，它尝试读取图像并输出到命令行。

    #convert image to greyscale for OCR
    im_g = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)

    #create threshold image to simplify things.
    im_t = cv2.threshold(im_g, 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY_INV)[1]

    #define kernel size
    rect_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (20,20))

    #Apply dilation to threshold image
    im_d = cv2.dilate(im_t, rect_kernel, iterations = 1)

    #Find countours
    contours = cv2.findContours(im_t, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[0]

    for cnt in contours:
        x,y,w,h = cv2.boundingRect(cnt)

        #crop
        im_c = im[y:y+h, x:x+w]

        speed = pytesseract.image_to_string(im_c)
        print(im_path +" : " + speed)

Here's an example of an image

它的输出是：

frame10008.jpg : VAeVAs}

通过将以下配置添加到 tesseract 图像到字符串函数中，我在一些图像中得到了微小的改进：

config="--psm 7"

如果没有新配置，它将无法检测到 this 图片。现在输出

frame100.jpg : | U |

关于我做错了什么有什么想法吗？我可以采取不同的方法来解决这个问题吗？我愿意完全不使用 Tesseract。

Answer 1

我找到了一个不错的解决方法。首先，我把图像放大了。更多的 tesseract 工作区域帮助了它很多。其次，为了摆脱非数字输出，我在图像到字符串函数上使用了以下配置：

config = "--psm 7 outputbase digits"

该行现在看起来像这样：

speed = pytesseract.image_to_string(im_c, config = "--psm 7 outputbase digits")

返回的数据远非完美，但成功率足够高，我应该能够清理垃圾数据并在 tesseract returns 没有数字的地方进行插值。

Answer 2

我尝试使用 image_to_data 函数反转前景和背景像素值以及 OCRed 图像并得到预期结果：7576

gray_image = 255 - gra_image
#convert OpenCV image to PIL image data format
gray_pil = Image.fromarray(gray_image)

# OCR image
config = ('-l eng --oem 1 --psm 7')
text = pytesseract.image_to_data(gray_pil, config=config, output_type='dict')

即使输入文本，Tesseract OCR 也会给出非常糟糕的输出

Tesseract OCR gives really bad output even with typed text

python

tesseract