使用 pytesseract 执行 OCR 时出错
Error while performing OCR using pytesseract
我想使用 pytesseract。这是我的代码。
import pytesseract
from pdf2image import convert_from_path
PDF_file = 'file.pdf'
text = ''
pages = convert_from_path(PDF_file, 500)
pageText = str(((pytesseract.image_to_string(pages[0]))))
结果我得到这个错误
Traceback (most recent call last):
File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pdf2image\pdf2image.py", line 409, in pdfinfo_from_path
proc = Popen(command, env=env, stdout=PIPE, stderr=PIPE)
File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 854, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 1307, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\user\Desktop\projects\pdfparser\pdftest.py", line 13, in
pages = convert_from_path(PDF_file, 500)
File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pdf2image\pdf2image.py", line 89, in convert_from_path
page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"]
File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pdf2image\pdf2image.py", line 430, in pdfinfo_from_path
raise PDFInfoNotInstalledError(
pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?
正如很多评论已经指出的那样,错误信息
PDFInfoNotInstalledError( pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?
准确告诉您出了什么问题:未安装 Poppler。请参阅 README 以获取这方面的帮助。
你看,pdf2image
只是 pdftoppm
命令行实用程序的包装器。在 Linux 上它是默认安装的,所以你不需要费心去安装它,但在 Windows 上它不是。
我想使用 pytesseract。这是我的代码。
import pytesseract
from pdf2image import convert_from_path
PDF_file = 'file.pdf'
text = ''
pages = convert_from_path(PDF_file, 500)
pageText = str(((pytesseract.image_to_string(pages[0]))))
结果我得到这个错误
Traceback (most recent call last): File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pdf2image\pdf2image.py", line 409, in pdfinfo_from_path proc = Popen(command, env=env, stdout=PIPE, stderr=PIPE) File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 854, in init self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 1307, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] The system cannot find the file specified
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "C:\Users\user\Desktop\projects\pdfparser\pdftest.py", line 13, in pages = convert_from_path(PDF_file, 500) File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pdf2image\pdf2image.py", line 89, in convert_from_path page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"] File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pdf2image\pdf2image.py", line 430, in pdfinfo_from_path raise PDFInfoNotInstalledError( pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?
正如很多评论已经指出的那样,错误信息
PDFInfoNotInstalledError( pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?
准确告诉您出了什么问题:未安装 Poppler。请参阅 README 以获取这方面的帮助。
你看,pdf2image
只是 pdftoppm
命令行实用程序的包装器。在 Linux 上它是默认安装的,所以你不需要费心去安装它,但在 Windows 上它不是。