如何使用 python 从网络 gif 文件中提取文本
how to extract text from web gif file using python
我正在尝试使用以下代码从 gif 图像中提取文本,它适用于 png 格式,不适用于 gif。
import pytesseract
import io
import requests
from PIL import Image
url = requests.get('http://article.sapub.org/email/10.5923.j.aac.20190902.01.gif')
img = Image.open(io.BytesIO(url.content))
text = pytesseract.image_to_string(img)
print(text)
收到此错误
C:\python\lib\site-packages\PIL\Image.py:1048: UserWarning: Couldn't allocate palette entry for transparency
warnings.warn("Couldn't allocate palette entry for transparency")
Traceback (most recent call last):
File "D:/elifesciences/prox.py", line 8, in <module>
text = pytesseract.image_to_string(img)
File "C:\python\lib\site-packages\pytesseract\pytesseract.py", line 345, in image_to_string
}[output_type]()
File "C:\python\lib\site-packages\pytesseract\pytesseract.py", line 344, in <lambda>
Output.STRING: lambda: run_and_get_output(*args),
File "C:\python\lib\site-packages\pytesseract\pytesseract.py", line 242, in run_and_get_output
temp_name, input_filename = save_image(image)
File "C:\python\lib\site-packages\pytesseract\pytesseract.py", line 173, in save_image
image.save(input_file_name, format=extension, **image.info)
File "C:\python\lib\site-packages\PIL\Image.py", line 2088, in save
save_handler(self, fp, filename)
File "C:\python\lib\site-packages\PIL\GifImagePlugin.py", line 507, in _save
_write_single_frame(im, fp, palette)
File "C:\python\lib\site-packages\PIL\GifImagePlugin.py", line 414, in _write_single_frame
_write_local_header(fp, im, (0, 0), flags)
File "C:\python\lib\site-packages\PIL\GifImagePlugin.py", line 532, in _write_local_header
transparency = int(transparency)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'tuple'
Process finished with exit code 1
想法是在对它们执行 OCR 之前将每个帧转换为 RGB 图像,如下所示 -
for frame in range(0,img.n_frames):
img.seek(frame)
imgrgb = img.convert('RGBA')
imgrgb.show()
text = pytesseract.image_to_string(imgrgb)
print(text)
工作样本 - https://colab.research.google.com/drive/1ctjk3hH0HUaWv0st6UpTY-oo9C9YCQdw
我正在尝试使用以下代码从 gif 图像中提取文本,它适用于 png 格式,不适用于 gif。
import pytesseract
import io
import requests
from PIL import Image
url = requests.get('http://article.sapub.org/email/10.5923.j.aac.20190902.01.gif')
img = Image.open(io.BytesIO(url.content))
text = pytesseract.image_to_string(img)
print(text)
收到此错误
C:\python\lib\site-packages\PIL\Image.py:1048: UserWarning: Couldn't allocate palette entry for transparency
warnings.warn("Couldn't allocate palette entry for transparency")
Traceback (most recent call last):
File "D:/elifesciences/prox.py", line 8, in <module>
text = pytesseract.image_to_string(img)
File "C:\python\lib\site-packages\pytesseract\pytesseract.py", line 345, in image_to_string
}[output_type]()
File "C:\python\lib\site-packages\pytesseract\pytesseract.py", line 344, in <lambda>
Output.STRING: lambda: run_and_get_output(*args),
File "C:\python\lib\site-packages\pytesseract\pytesseract.py", line 242, in run_and_get_output
temp_name, input_filename = save_image(image)
File "C:\python\lib\site-packages\pytesseract\pytesseract.py", line 173, in save_image
image.save(input_file_name, format=extension, **image.info)
File "C:\python\lib\site-packages\PIL\Image.py", line 2088, in save
save_handler(self, fp, filename)
File "C:\python\lib\site-packages\PIL\GifImagePlugin.py", line 507, in _save
_write_single_frame(im, fp, palette)
File "C:\python\lib\site-packages\PIL\GifImagePlugin.py", line 414, in _write_single_frame
_write_local_header(fp, im, (0, 0), flags)
File "C:\python\lib\site-packages\PIL\GifImagePlugin.py", line 532, in _write_local_header
transparency = int(transparency)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'tuple'
Process finished with exit code 1
想法是在对它们执行 OCR 之前将每个帧转换为 RGB 图像,如下所示 -
for frame in range(0,img.n_frames):
img.seek(frame)
imgrgb = img.convert('RGBA')
imgrgb.show()
text = pytesseract.image_to_string(imgrgb)
print(text)
工作样本 - https://colab.research.google.com/drive/1ctjk3hH0HUaWv0st6UpTY-oo9C9YCQdw