提高 TesseractError(proc.returncode, get_errors(error_string))

raise TesseractError(proc.returncode, get_errors(error_string))

我正在尝试使用 Python 中的 pytesseract 模块从图像中提取文本,但是当我执行下面的代码时,我一直收到错误消息。有人提供了一个类似的问题 ..... 但我仍然遇到同样的错误。有什么建议吗?

import pytesseract as py
from PIL import Image
cmd = py.pytesseract.tesseract_cmd =r'C:\Users\mortiz\AppData\Local\Programs\Python\Python37-32\Scripts\pytesseract.exe'
img=r"C:\Python\Images to text\databases.jpg"
py.image_to_string(img)

---------------------------------------------------------------------------
TesseractError                            Traceback (most recent call last)
<ipython-input-86-5e06d7c425c6> in <module>
      3 cmd = py.pytesseract.tesseract_cmd =r'C:\Users\mortiz\AppData\Local\Programs\Python\Python37-32\Scripts\pytesseract.exe'
      4 img=r"C:\Python\Images to text\databases.jpg"
----> 5 py.image_to_string(img)

c:\users\mortiz\appdata\local\programs\python\python37-32\lib\site-packages\pytesseract\pytesseract.py in image_to_string(image, lang, config, nice, output_type, timeout)
    346         Output.DICT: lambda: {'text': run_and_get_output(*args)},
    347         Output.STRING: lambda: run_and_get_output(*args),
--> 348     }[output_type]()
    349 
    350 

c:\users\mortiz\appdata\local\programs\python\python37-32\lib\site-packages\pytesseract\pytesseract.py in <lambda>()
    345         Output.BYTES: lambda: run_and_get_output(*(args + [True])),
    346         Output.DICT: lambda: {'text': run_and_get_output(*args)},
--> 347         Output.STRING: lambda: run_and_get_output(*args),
    348     }[output_type]()
    349 

c:\users\mortiz\appdata\local\programs\python\python37-32\lib\site-packages\pytesseract\pytesseract.py in run_and_get_output(image, extension, lang, config, nice, timeout, return_bytes)
    256         }
    257 
--> 258         run_tesseract(**kwargs)
    259         filename = kwargs['output_filename_base'] + extsep + extension
    260         with open(filename, 'rb') as output_file:

c:\users\mortiz\appdata\local\programs\python\python37-32\lib\site-packages\pytesseract\pytesseract.py in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice, timeout)
    232     with timeout_manager(proc, timeout) as error_string:
    233         if proc.returncode:
--> 234             raise TesseractError(proc.returncode, get_errors(error_string))
    235 
    236 

TesseractError: (2, 'Usage: pytesseract [-l lang] input_file')

您将字符串作为图像而不是图像传递。您必须将 tesseract 调用更改为:

img=r"C:\Python\Images to text\databases.jpg"
py.image_to_string(Image.open(img))

或者, 您可以使用 opencv 打开图像。工作正常。

您可以使用 pip 安装 opencv。

pip install opencv-python

安装完成后, 你可以通过

阅读图片
import cv2
import pytesseract
image=cv2.imread('path/to/image.jpg')
string=pytesseract.image_to_string(image)