Pytesseract - 输出极其不准确 (MAC)

Question

我通过 pip 安装了 pytesseract，结果很糟糕。

当我搜索它时，我想我需要给它更多的数据但我找不到放 tessedata(traineddata) 的地方因为没有像 ProgramFile\Tesseract-OCR 这样的目录，使用 Mac.

图像的分辨率、字体或大小都没有问题。 Image whose result is 'ecient Sh Abu'

因为大而清晰的测试图效果很好，我认为是数据不足的问题。但欢迎任何其他可能的解决方案，只要它可以读取带有 Python.

的文本

请帮帮我..

Answer 1

I installed pytesseract via pip and its result is terrible.

有时您需要对输入图像进行预处理以获得准确的结果。

Because large and clear test images work fine, I think it is a problem about lack of data. But any other possible solution is welcomed as long as it can read text with Python.

你可以说缺乏数据是一个问题。我想你会发现 morphological-transformations 有用。

例如，如果我们应用 close 操作，结果将是：

图片看起来与发布的原始图片相似。然而，输出图像略有变化（即语法词与原始图像略有不同）

现在如果我们读取输出图像：

English
Grammar Practice
ter K-SAT (1-10)

代码：

import cv2
from pytesseract import image_to_string

img = cv2.imread("6Celp.jpg")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
opn = cv2.morphologyEx(gry, cv2.MORPH_OPEN, None)
txt = image_to_string(opn)
txt = txt.split("\n")
for i in txt:
    i = i.strip()
    if i != '' and len(i) > 3:
        print(i)

Pytesseract - 输出极其不准确 (MAC)

Pytesseract - output is extremely inaccurate (MAC)

python

text

python-tesseract