如何使用 python 从网络 gif 文件中提取文本

Question

我正在尝试使用以下代码从 gif 图像中提取文本，它适用于 png 格式，不适用于 gif。

import pytesseract
import io
import requests
from PIL import Image

url = requests.get('http://article.sapub.org/email/10.5923.j.aac.20190902.01.gif')
img = Image.open(io.BytesIO(url.content))
text = pytesseract.image_to_string(img)
print(text)

收到此错误

C:\python\lib\site-packages\PIL\Image.py:1048: UserWarning: Couldn't allocate palette entry for transparency
 warnings.warn("Couldn't allocate palette entry for transparency")
Traceback (most recent call last):
 File "D:/elifesciences/prox.py", line 8, in <module>
text = pytesseract.image_to_string(img)
File "C:\python\lib\site-packages\pytesseract\pytesseract.py", line 345, in image_to_string
 }[output_type]()
File "C:\python\lib\site-packages\pytesseract\pytesseract.py", line 344, in <lambda>
 Output.STRING: lambda: run_and_get_output(*args),
File "C:\python\lib\site-packages\pytesseract\pytesseract.py", line 242, in run_and_get_output
 temp_name, input_filename = save_image(image)
File "C:\python\lib\site-packages\pytesseract\pytesseract.py", line 173, in save_image
 image.save(input_file_name, format=extension, **image.info)
File "C:\python\lib\site-packages\PIL\Image.py", line 2088, in save
 save_handler(self, fp, filename)
File "C:\python\lib\site-packages\PIL\GifImagePlugin.py", line 507, in _save
 _write_single_frame(im, fp, palette)
File "C:\python\lib\site-packages\PIL\GifImagePlugin.py", line 414, in _write_single_frame
 _write_local_header(fp, im, (0, 0), flags)
File "C:\python\lib\site-packages\PIL\GifImagePlugin.py", line 532, in _write_local_header
 transparency = int(transparency)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'tuple'

Process finished with exit code 1

Answer 1

想法是在对它们执行 OCR 之前将每个帧转换为 RGB 图像，如下所示 -

for frame in range(0,img.n_frames):

    img.seek(frame)

    imgrgb = img.convert('RGBA')

    imgrgb.show()

    text = pytesseract.image_to_string(imgrgb)

    print(text)

工作样本 - https://colab.research.google.com/drive/1ctjk3hH0HUaWv0st6UpTY-oo9C9YCQdw

如何使用 python 从网络 gif 文件中提取文本

how to extract text from web gif file using python

python

io

tesseract

python-imaging-library