python 中无法使用 pytesseract 从 tif 图像中提取文本
unable to extract text from tif image using pytesseract in python
我无法在 Python 中使用 pytesseract 和 PIL 从 .tif 图像文件中提取文本。
它适用于 .png、.jpg 图像文件,它只在 .tif 图像文件中给出错误。
我正在使用 Python 3.7.1 版本
它在 运行 Python .tif 图像文件的代码中给出以下错误。请让我知道我做错了什么。
Fax3SetupState: Bits/sample must be 1 for Group 3/4 encoding/decoding.
Traceback (most recent call last):
File "C:/Users/u88ltuc/PycharmProjects/untitled1/Image Processing/Prog1.py", line 13, in <module>
image_to_text = pytesseract.image_to_string(image, lang='eng')
File "C:\Users\u88ltuc\PycharmProjects\untitled1\venv\lib\site-packages\pytesseract\pytesseract.py", line 347, in image_to_string
}[output_type]()
File "C:\Users\u88ltuc\PycharmProjects\untitled1\venv\lib\site-packages\pytesseract\pytesseract.py", line 346, in <lambda>
Output.STRING: lambda: run_and_get_output(*args),
File "C:\Users\u88ltuc\PycharmProjects\untitled1\venv\lib\site-packages\pytesseract\pytesseract.py", line 246, in run_and_get_output
with save(image) as (temp_name, input_filename):
File "C:\Program Files\Python37\lib\contextlib.py", line 112, in __enter__
return next(self.gen)
File "C:\Users\u88ltuc\PycharmProjects\untitled1\venv\lib\site-packages\pytesseract\pytesseract.py", line 171, in save
image.save(input_file_name, format=extension, **image.info)
File "C:\Users\u88ltuc\PycharmProjects\untitled1\venv\lib\site-packages\PIL\Image.py", line 2102, in save
save_handler(self, fp, filename)
File "C:\Users\u88ltuc\PycharmProjects\untitled1\venv\lib\site-packages\PIL\TiffImagePlugin.py", line 1626, in _save
raise OSError("encoder error %d when writing image file" % s)
OSError: encoder error -2 when writing image file
下面是它的 Python 代码。
#Import modules
from PIL import Image
import pytesseract
# Include tesseract executable in your path
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Create an image object of PIL library
image = Image.open(r'C:\Users\u88ltuc\Desktop110845-e001.tif')
# pass image into pytesseract module
image_to_text = pytesseract.image_to_string(image, lang='eng')
# Print the text
print(image_to_text)
下面是tif图像及其link:
首先,您应该更改您的图片扩展名。
这也许可以解决您的问题:
from PIL import Image
from io import BytesIO
import pytesseract
img = Image.open(r"C:\Users\u88ltuc\Desktop110845-e001.tif")
TempIO = BytesIO()
img.save(TempIO,format="JPEG")
img = Image.open(BytesIO(TempIO.getvalue()))
print(pytesseract.image_to_string(img))
或者如果你不介意你的桌面有两张相同的图片,你不需要import BytesIO
,这里是:
from PIL import Image
import pytesseract
img = Image.open(r"C:\Users\u88ltuc\Desktop110845-e001.tif")
img.save(r"C:\Users\u88ltuc\Desktop110845-e001.jpg")
img = Image.open(r"C:\Users\u88ltuc\Desktop110845-e001.jpg")
print(pytesseract.image_to_string(img))
我无法在 Python 中使用 pytesseract 和 PIL 从 .tif 图像文件中提取文本。 它适用于 .png、.jpg 图像文件,它只在 .tif 图像文件中给出错误。 我正在使用 Python 3.7.1 版本
它在 运行 Python .tif 图像文件的代码中给出以下错误。请让我知道我做错了什么。
Fax3SetupState: Bits/sample must be 1 for Group 3/4 encoding/decoding.
Traceback (most recent call last):
File "C:/Users/u88ltuc/PycharmProjects/untitled1/Image Processing/Prog1.py", line 13, in <module>
image_to_text = pytesseract.image_to_string(image, lang='eng')
File "C:\Users\u88ltuc\PycharmProjects\untitled1\venv\lib\site-packages\pytesseract\pytesseract.py", line 347, in image_to_string
}[output_type]()
File "C:\Users\u88ltuc\PycharmProjects\untitled1\venv\lib\site-packages\pytesseract\pytesseract.py", line 346, in <lambda>
Output.STRING: lambda: run_and_get_output(*args),
File "C:\Users\u88ltuc\PycharmProjects\untitled1\venv\lib\site-packages\pytesseract\pytesseract.py", line 246, in run_and_get_output
with save(image) as (temp_name, input_filename):
File "C:\Program Files\Python37\lib\contextlib.py", line 112, in __enter__
return next(self.gen)
File "C:\Users\u88ltuc\PycharmProjects\untitled1\venv\lib\site-packages\pytesseract\pytesseract.py", line 171, in save
image.save(input_file_name, format=extension, **image.info)
File "C:\Users\u88ltuc\PycharmProjects\untitled1\venv\lib\site-packages\PIL\Image.py", line 2102, in save
save_handler(self, fp, filename)
File "C:\Users\u88ltuc\PycharmProjects\untitled1\venv\lib\site-packages\PIL\TiffImagePlugin.py", line 1626, in _save
raise OSError("encoder error %d when writing image file" % s)
OSError: encoder error -2 when writing image file
下面是它的 Python 代码。
#Import modules
from PIL import Image
import pytesseract
# Include tesseract executable in your path
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Create an image object of PIL library
image = Image.open(r'C:\Users\u88ltuc\Desktop110845-e001.tif')
# pass image into pytesseract module
image_to_text = pytesseract.image_to_string(image, lang='eng')
# Print the text
print(image_to_text)
下面是tif图像及其link:
首先,您应该更改您的图片扩展名。 这也许可以解决您的问题:
from PIL import Image
from io import BytesIO
import pytesseract
img = Image.open(r"C:\Users\u88ltuc\Desktop110845-e001.tif")
TempIO = BytesIO()
img.save(TempIO,format="JPEG")
img = Image.open(BytesIO(TempIO.getvalue()))
print(pytesseract.image_to_string(img))
或者如果你不介意你的桌面有两张相同的图片,你不需要import BytesIO
,这里是:
from PIL import Image
import pytesseract
img = Image.open(r"C:\Users\u88ltuc\Desktop110845-e001.tif")
img.save(r"C:\Users\u88ltuc\Desktop110845-e001.jpg")
img = Image.open(r"C:\Users\u88ltuc\Desktop110845-e001.jpg")
print(pytesseract.image_to_string(img))