如何使用 python 从网络 gif 文件中提取文本

how to extract text from web gif file using python

我正在尝试使用以下代码从 gif 图像中提取文本,它适用于 png 格式,不适用于 gif。

import pytesseract
import io
import requests
from PIL import Image

url = requests.get('http://article.sapub.org/email/10.5923.j.aac.20190902.01.gif')
img = Image.open(io.BytesIO(url.content))
text = pytesseract.image_to_string(img)
print(text)

收到此错误

C:\python\lib\site-packages\PIL\Image.py:1048: UserWarning: Couldn't allocate palette entry for transparency
 warnings.warn("Couldn't allocate palette entry for transparency")
Traceback (most recent call last):
 File "D:/elifesciences/prox.py", line 8, in <module>
text = pytesseract.image_to_string(img)
File "C:\python\lib\site-packages\pytesseract\pytesseract.py", line 345, in image_to_string
 }[output_type]()
File "C:\python\lib\site-packages\pytesseract\pytesseract.py", line 344, in <lambda>
 Output.STRING: lambda: run_and_get_output(*args),
File "C:\python\lib\site-packages\pytesseract\pytesseract.py", line 242, in run_and_get_output
 temp_name, input_filename = save_image(image)
File "C:\python\lib\site-packages\pytesseract\pytesseract.py", line 173, in save_image
 image.save(input_file_name, format=extension, **image.info)
File "C:\python\lib\site-packages\PIL\Image.py", line 2088, in save
 save_handler(self, fp, filename)
File "C:\python\lib\site-packages\PIL\GifImagePlugin.py", line 507, in _save
 _write_single_frame(im, fp, palette)
File "C:\python\lib\site-packages\PIL\GifImagePlugin.py", line 414, in _write_single_frame
 _write_local_header(fp, im, (0, 0), flags)
File "C:\python\lib\site-packages\PIL\GifImagePlugin.py", line 532, in _write_local_header
 transparency = int(transparency)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'tuple'

Process finished with exit code 1

想法是在对它们执行 OCR 之前将每个帧转换为 RGB 图像,如下所示 -

for frame in range(0,img.n_frames):

    img.seek(frame)

    imgrgb = img.convert('RGBA')

    imgrgb.show()

    text = pytesseract.image_to_string(imgrgb)

    print(text)

工作样本 - https://colab.research.google.com/drive/1ctjk3hH0HUaWv0st6UpTY-oo9C9YCQdw