仅使用 Tesseract OCR 进行字符分割

Using Tesseract OCR for Character Segmentation Only

python
tesseract
text-segmentation

我想对打印文档进行文本分割。我已经将文档分割成字符分割，但是当我遇到一些感人的字符时我失败了。我只想使用 Tesseract OCR 来分割单词。我知道 Tesseract 可以完成这项任务，但我不知道如何在不挖掘 tesseract 的内部代码的情况下访问它。谁能给我一些建议？如果可能的话，我需要 Python.

如果可以调用TessBaseAPIGetComponentImagesAPI方法，就可以在pageIteratorLevel各个层级（Symbol/Character、Word、Line等）检索分词，而无需执行图像上的实际 OCR。

仅使用 Tesseract OCR 进行字符分割

Using Tesseract OCR for Character Segmentation Only

python

tesseract

text-segmentation