Pytesseract 和 Tesserocr 有什么区别？

Question

我在 Windows 10 中使用 Python 3.6，并且已经安装了 Pytesseract，但我在 Tesserocr 中找到了，顺便说一句，我无法安装。有什么区别？

Answer 1

pytesseract 只是 tesseract-ocr 对 Python 的绑定。所以，如果你想在 python 代码中使用 tesseract-ocr 而不使用 subprocess 或 os 模块用于运行ning 命令行 tesseract-ocr 命令，那么你使用 pytesseract。但是，为了使用它，您必须安装 tesseract-ocr。

你可以这么想。您需要安装 tesseract-ocr，因为它实际上是运行并执行 OCR 的程序。但是，如果你想从 python 代码中运行它作为一个函数，你可以安装 pytesseract 包，使你能够做到这一点。因此，当您运行 pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra') 时，它会使用提供的参数调用 tesseract-ocr。结果与运行宁tesseract test-european.jpg -l fra相同。因此，您可以从代码中调用它，但最终，它仍然需要运行 tesseract-ocr 来执行实际的 OCR。

Answer 2

Pytesseract is a python "wrapper" for the tesseract binary. It offers only the following functions, along with specifying flags (man page):

get_tesseract_version Returns系统安装的Tesseract版本
image_to_string Returns 图像上 Tesseract OCR 运行的结果到字符串
image_to_boxes Returns 结果包含已识别字符及其框边界
image_to_data Returns 结果包含框边界、置信度和其他信息。需要 Tesseract 3.05+。更多信息，请查看 Tesseract TSV 文档
image_to_osd Returns 结果包含有关方向和脚本检测的信息。

有关详细信息，请参阅 project description。

另一方面，tesserocr interfaces directly with Tesseract's C++ API (APIExample) 更 flexible/complex 并提供高级功能。

Answer 3

根据我的经验，Tesserocr 比 Pytesseract 快得多。

Tesserocr 是 Tesseract C++ API 的 python 包装器。而 pytesseract 是 tesseract-ocr CLI 的包装器。

因此，使用 Tesserocr，您可以在开头或您的程序中加载模型，然后运行单独加载模型（例如在循环中处理视频）。使用 pytesseract，每次调用 image_to_string 函数时，它都会加载模型并处理图像，因此视频处理速度较慢。

要安装 tesserocr，我只是在终端中输入 pip install tesserocr。

使用 tesserocr

import tesserocr
from PIL import Image
api = tesserocr.PyTessBaseAPI()
pil_image = Image.open('sample.jpg')
api.SetImage(pil_image)
text = api.GetUTF8Text()

安装 pytesseract：pip install pytesseract.

给运行吧:

import pytesseract
import cv2
image = cv2.imread('sample.jpg')
text = pytesseract.image_to_string(image)

Pytesseract 和 Tesserocr 有什么区别？

What is the difference between Pytesseract and Tesserocr?

python

ocr

tesseract

python-tesseract