将 PIL Image 传递给 google 云视觉而不保存和读取

Question

更新如下

有没有办法将 PIL 图像传递给 google 云视觉？

我尝试使用 io.Bytes、io.String 和 Image.tobytes()，但我总是得到：

Traceback (most recent call last):
  "C:\Users\...\vision_api.py", line 20, in get_text
    image = vision.Image(content)
  File "C:\...\venv\lib\site-packages\proto\message.py", line 494, in __init__
    raise TypeError(
TypeError: Invalid constructor input for Image:b'Ma\x81Ma\x81La\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81La\x81Ma\x81Ma\x81Ma\x81Ma\x80Ma\x81La\x81Ma\x81Ma\x81Ma\x80Ma\x81Ma\x81Ma\x81Ma\x8 ...

或者如果我直接传递 PIL-Image 的话：

TypeError: Invalid constructor input for Image: <PIL.Image.Image image mode=RGB size=480x300 at 0x1D707131DC0>

这是我的代码：

image = Image.open(path).convert('RGB')   # Opening the saved image
cropped_image = image.crop((30, 900, 510, 1200))   # Cropping the image

vision_image = vision.Image(# I passed the different options)   # Here I need to pass the image, but I don't know how
client = vision.ImageAnnotatorClient()
response = client.text_detection(image=vision_image)   # Text detection using google-vision-api

为清楚起见：

我希望 google 文本检测仅分析保存在我磁盘上的图像的特定部分。所以我的想法是使用 PIL 裁剪图像，然后将裁剪后的图像传递给 google-vision。但是无法将 PIL-Image 传递给 vision.Image，因为我得到了上面的错误。

来自 Google 的 documentation。

这可以在 vision.Image class:

中找到

Attributes:
        content (bytes):
            Image content, represented as a stream of bytes. Note: As
            with all ``bytes`` fields, protobuffers use a pure binary
            representation, whereas JSON representations use base64.

            Currently, this field only works for BatchAnnotateImages
            requests. It does not work for AsyncBatchAnnotateImages
            requests.

一个可行的选择是将 PIL-Image 另存为 PNG/JPG 在我的磁盘上并使用以下方式加载它：

with io.open(file_name, 'rb') as image_file:
    content = image_file.read()

vision_image = vision.Image(content=content)

但这很慢而且似乎没有必要。对我来说，使用 google-vision-api 的全部意义在于与 open-cv 相匹配的速度。

截至 2021 年 9 月 25 日更新

from PIL import Image
from io import BytesIO
from google.cloud import vision


with open('images/screenshots/screenshot.png', 'rb') as image_file:
    data = image_file.read()
    try:
        image = vision.Image(content=data)
        print('worked')

    except TypeError:
        print('failed')


im = Image.open('images/screenshots/screenshot.png')
buffer = BytesIO()
im.save(buffer, format='PNG')
try:
    image = vision.Image(buffer.getvalue())
    print('worked')

except TypeError:
    print('failed')

第一个版本按预期工作，但我无法让第二个版本按照@Mark Setchell 的建议工作。前几个字符（~50）相同，其余完全不同。

2021 年 9 月 26 日更新

两个输入的类型都是<class 'bytes'>。完整的错误堆栈可以在问题的顶部看到。

使用此代码：

print(input_data[:200])
print(type(input_data))

我得到以下输出：

b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x048\x00\x00\x07\x80\x08\x06\x00\x00\x00+a\xe7\n\x00\x00\x00\x04sBIT\x08\x08\x08\x08|\x08d\x88\x00\x00 \x00IDATx\x9c\xec\xbdy\xd8-\xc7Y\x1f\xf8\xab\xea>\xe7\xdb\xef\xaa\xbbk\xb3%\xcb\x8b\x16[\x12\xc6\xc8\xbb,\x1b\x03\x06\xc6\x8111\x93@2y\xc2381\x8b1\x90\x10\x9e\xf18\x93\x10\x0811\x84\x192\x0c3\x9e\x1020\x03\x03\xc3\xb0\x04\xf0C0\xc6\x96m\xc9\x96m\xed\xb2dI\x96\xaetu\xf7\xed\xdb\xcf\xe9\xae\x9a?j\xe9\xea\xbd\xba\xbb\xbaO\x9f\xef\x9e\xd7\xd6\xfd\xfat\xbf\xf5Vu-o\xbd\xf5\xeb\xb7\xde"\xef\xff\xc7\'8\x1c\x13\x07\x00\xd2\x82\xcc6\xe5\xc6\xa8B&'
<class 'bytes'>

用于工作输入。并且：

b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x048\x00\x00\x07\x80\x08\x06\x00\x00\x00+a\xe7\n\x00\x01\x00\x00IDATx\x9c\xec\xbdw\x80$\xc7u\x1f\xfc\xab\xea\xeeI\x9bw/\'\x1cr\xce\x04@\x10\x04A\x82`\x84\x95%J"\x95,\xcb\x1f%\x91T\xb0$*}\x1fM\xd9\x96\x95EY\x94(\xc9\xb6\x92i+\x90\x12\x83(3)0\x82\x08$rN\x07\\xce\xb7\xb7yBw\xd5\xf7G\x85\xaeN3\xdd=\xdd\xb3\xb3{\xfb\xc8\xc3\xceLW\xbd\xca\xaf\xde\xfb\xf5\xabW\xe4{\xdeu\x84\xa3`\xe2\x00@J\xe0Y&\xdf\x00e($\x94\x94\'p\xcc\xc3\xda\xe7Y\x0c\xf1Te\x13\xbf\xcc>\xfa:]Y=x\x84\x7f\xe8\xc23u\x1f\x91l\xfd\x99'
<class 'bytes'>

输入失败。

Answer 1

最好有完整的错误堆栈和更准确的代码片段。但是形式呈现的信息这似乎是两个不同“图像”的混淆。可能是一些 copy/paste 错误，因为 tutorials 有完全相同的行：

response = client.text_detection(image=image)

但是提到的教程 image 是由 vision.Image() 创建的，所以我认为在提供的代码中应该是：

response = client.text_detection(image=vision_image)

因为，至少如果我正确理解代码片段，image 是 PIL 图像，而 vision_image 是应该传递给 text_detection 方法的视觉图像。所以在 vision.Image() 中所做的任何事情都不会影响错误信息。

Answer 2

据我所知，您从 PIL Image 开始，您希望在内存中获取 PNG 图像而不是转到磁盘。所以你需要这个：

#!/usr/bin/env python3

from PIL import Image
from io import BytesIO

# Create PIL Image like you have - filled with red
im = Image.new('RGB', (320,240), (255,0,0))

# Create in-memory PNG - like you want for Google Cloud Vision
buffer = BytesIO()
im.save(buffer, format="PNG")

# Look at first few bytes
PNG = buffer.getvalue()
print(PNG[:20])

它会打印这个，如果您将图像作为 PNG 格式写入磁盘，然后将其作为二进制读回，这正是您将得到的结果 - 除了它是在内存中执行的，而不是转到磁盘：

b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01@'

将 PIL Image 传递给 google 云视觉而不保存和读取

Pass PIL Image to google cloud vision without saving and reading

python

image

python-imaging-library

google-cloud-vision