使用 pytesseract 执行 OCR 时出错

Error while performing OCR using pytesseract

我想使用 pytesseract。这是我的代码。

import pytesseract 
from pdf2image import convert_from_path 

PDF_file = 'file.pdf'
text = '' 
pages = convert_from_path(PDF_file, 500)
pageText = str(((pytesseract.image_to_string(pages[0])))) 

结果我得到这个错误

Traceback (most recent call last): File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pdf2image\pdf2image.py", line 409, in pdfinfo_from_path proc = Popen(command, env=env, stdout=PIPE, stderr=PIPE) File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 854, in init self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 1307, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\user\Desktop\projects\pdfparser\pdftest.py", line 13, in pages = convert_from_path(PDF_file, 500) File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pdf2image\pdf2image.py", line 89, in convert_from_path page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"] File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pdf2image\pdf2image.py", line 430, in pdfinfo_from_path raise PDFInfoNotInstalledError( pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?

正如很多评论已经指出的那样,错误信息

PDFInfoNotInstalledError( pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?

准确告诉您出了什么问题:未安装 Poppler。请参阅 README 以获取这方面的帮助。

你看,pdf2image 只是 pdftoppm 命令行实用程序的包装器。在 Linux 上它是默认安装的,所以你不需要费心去安装它,但在 Windows 上它不是。