为什么 GCP Vision API returns 在 python 中的结果比在线演示更差
Why GCP Vision API returns worse results in python than at its online demo
我写了一个基本的 python 脚本来调用和使用 GCP Vision API。我的目标是向它发送产品图像并检索(使用 OCR)写在这个盒子上的文字。我有一个预定义的品牌列表,因此我可以在 API 品牌返回的文本中搜索并检测它是什么。
我的 python 脚本如下:
import io
from google.cloud import vision
from google.cloud.vision import types
import os
import cv2
import numpy as np
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "**************************"
def detect_text(file):
"""Detects text in the file."""
client = vision.ImageAnnotatorClient()
with io.open(file, 'rb') as image_file:
content = image_file.read()
image = types.Image(content=content)
response = client.text_detection(image=image)
texts = response.text_annotations
print('Texts:')
for text in texts:
print('\n"{}"'.format(text.description))
vertices = (['({},{})'.format(vertex.x, vertex.y)
for vertex in text.bounding_poly.vertices])
print('bounds: {}'.format(','.join(vertices)))
file_name = "Image.jpg"
img = cv2.imread(file_name)
detect_text(file_name)
目前,我正在试验以下产品图片:(951∆×∆335 分辨率)
它的品牌是Acuvue
。
问题如下。当我测试 GCP Cloud Vision 的在线演示 API 时,我得到此图像的以下文本结果:
FOR ASTIGMATISM 1-DAY ACUVUE MOIST WITH LACREON™ 30 Lenses BRAND CONTACT LENSES UV BLOCKING
(这个 returns 的 json 结果包括上面所有的词,包括对我很重要的词 Acuvue
但 json 太长 post 在这里)
因此,在线演示可以很好地检测产品上的文字,至少它可以准确地检测到单词 Acuvue
(即品牌)。但是,当我在我的 python 脚本中使用相同的图像调用相同的 API 时,我得到以下结果:
Texts:
"1.DAY
FOR ASTIGMATISM
WITH
LACREONTM
MOIS
30 Lenses
BRAND CONTACT LENSES
UV BLOCKING
"
bounds: (221,101),(887,101),(887,284),(221,284)
"1.DAY"
bounds: (221,101),(312,101),(312,125),(221,125)
"FOR"
bounds: (622,107),(657,107),(657,119),(622,119)
"ASTIGMATISM"
bounds: (664,107),(788,107),(788,119),(664,119)
"WITH"
bounds: (614,136),(647,136),(647,145),(614,145)
"LACREONTM"
bounds: (600,151),(711,146),(712,161),(601,166)
"MOIS"
bounds: (378,162),(525,153),(528,200),(381,209)
"30"
bounds: (614,177),(629,178),(629,188),(614,187)
"Lenses"
bounds: (634,178),(677,180),(677,189),(634,187)
"BRAND"
bounds: (361,210),(418,210),(418,218),(361,218)
"CONTACT"
bounds: (427,209),(505,209),(505,218),(427,218)
"LENSES"
bounds: (514,209),(576,209),(576,218),(514,218)
"UV"
bounds: (805,274),(823,274),(823,284),(805,284)
"BLOCKING"
bounds: (827,276),(887,276),(887,284),(827,284)
但这并没有像演示那样检测到单词 "Acuvue"!!
为什么会这样?
我可以修复我的 python 脚本中的某些内容以使其正常工作吗?
The Vision API can detect and extract text from images. There are two annotation features that support OCR:
TEXT_DETECTION detects and extracts text from any image. For example, a photograph might contain a street sign or traffic sign. The JSON includes the entire extracted string, as well as individual words, and their bounding boxes.
DOCUMENT_TEXT_DETECTION also extracts text from an image, but the response is optimized for dense text and documents. The JSON includes page, block, paragraph, word, and break information.)
我希望网络 API 实际上使用的是后者,然后根据置信度过滤结果。
A DOCUMENT_TEXT_DETECTION response includes additional layout information, such as page, block, paragraph, word, and break information, along with confidence scores for each.
无论如何,我希望(并且我的经验是)后一种方法会“更加努力地”找到所有字符串。
我不认为你做错了什么。只有两种并行检测方法。一个 (DOCUMENT_TEXT_DETECTION) 更强烈,针对文档进行了优化(可能针对拉直、对齐和均匀间隔的线条),并提供了一些应用程序可能不需要的更多信息。
所以我建议您按照 Python example here.
修改您的代码
最后,我的猜测是您询问的 242
是与它认为在尝试识别 ™ 符号时找到的 utf-8 字符对应的转义八进制值。
如果您使用以下代码段:
b = b"242"
s = b.decode('utf8')
print(s)
你会很高兴看到它打印 ™。
我写了一个基本的 python 脚本来调用和使用 GCP Vision API。我的目标是向它发送产品图像并检索(使用 OCR)写在这个盒子上的文字。我有一个预定义的品牌列表,因此我可以在 API 品牌返回的文本中搜索并检测它是什么。
我的 python 脚本如下:
import io
from google.cloud import vision
from google.cloud.vision import types
import os
import cv2
import numpy as np
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "**************************"
def detect_text(file):
"""Detects text in the file."""
client = vision.ImageAnnotatorClient()
with io.open(file, 'rb') as image_file:
content = image_file.read()
image = types.Image(content=content)
response = client.text_detection(image=image)
texts = response.text_annotations
print('Texts:')
for text in texts:
print('\n"{}"'.format(text.description))
vertices = (['({},{})'.format(vertex.x, vertex.y)
for vertex in text.bounding_poly.vertices])
print('bounds: {}'.format(','.join(vertices)))
file_name = "Image.jpg"
img = cv2.imread(file_name)
detect_text(file_name)
目前,我正在试验以下产品图片:
它的品牌是Acuvue
。
问题如下。当我测试 GCP Cloud Vision 的在线演示 API 时,我得到此图像的以下文本结果:
FOR ASTIGMATISM 1-DAY ACUVUE MOIST WITH LACREON™ 30 Lenses BRAND CONTACT LENSES UV BLOCKING
(这个 returns 的 json 结果包括上面所有的词,包括对我很重要的词 Acuvue
但 json 太长 post 在这里)
因此,在线演示可以很好地检测产品上的文字,至少它可以准确地检测到单词 Acuvue
(即品牌)。但是,当我在我的 python 脚本中使用相同的图像调用相同的 API 时,我得到以下结果:
Texts:
"1.DAY
FOR ASTIGMATISM
WITH
LACREONTM
MOIS
30 Lenses
BRAND CONTACT LENSES
UV BLOCKING
"
bounds: (221,101),(887,101),(887,284),(221,284)
"1.DAY"
bounds: (221,101),(312,101),(312,125),(221,125)
"FOR"
bounds: (622,107),(657,107),(657,119),(622,119)
"ASTIGMATISM"
bounds: (664,107),(788,107),(788,119),(664,119)
"WITH"
bounds: (614,136),(647,136),(647,145),(614,145)
"LACREONTM"
bounds: (600,151),(711,146),(712,161),(601,166)
"MOIS"
bounds: (378,162),(525,153),(528,200),(381,209)
"30"
bounds: (614,177),(629,178),(629,188),(614,187)
"Lenses"
bounds: (634,178),(677,180),(677,189),(634,187)
"BRAND"
bounds: (361,210),(418,210),(418,218),(361,218)
"CONTACT"
bounds: (427,209),(505,209),(505,218),(427,218)
"LENSES"
bounds: (514,209),(576,209),(576,218),(514,218)
"UV"
bounds: (805,274),(823,274),(823,284),(805,284)
"BLOCKING"
bounds: (827,276),(887,276),(887,284),(827,284)
但这并没有像演示那样检测到单词 "Acuvue"!!
为什么会这样?
我可以修复我的 python 脚本中的某些内容以使其正常工作吗?
The Vision API can detect and extract text from images. There are two annotation features that support OCR:
TEXT_DETECTION detects and extracts text from any image. For example, a photograph might contain a street sign or traffic sign. The JSON includes the entire extracted string, as well as individual words, and their bounding boxes.
DOCUMENT_TEXT_DETECTION also extracts text from an image, but the response is optimized for dense text and documents. The JSON includes page, block, paragraph, word, and break information.)
我希望网络 API 实际上使用的是后者,然后根据置信度过滤结果。
A DOCUMENT_TEXT_DETECTION response includes additional layout information, such as page, block, paragraph, word, and break information, along with confidence scores for each.
无论如何,我希望(并且我的经验是)后一种方法会“更加努力地”找到所有字符串。
我不认为你做错了什么。只有两种并行检测方法。一个 (DOCUMENT_TEXT_DETECTION) 更强烈,针对文档进行了优化(可能针对拉直、对齐和均匀间隔的线条),并提供了一些应用程序可能不需要的更多信息。
所以我建议您按照 Python example here.
修改您的代码最后,我的猜测是您询问的 242
是与它认为在尝试识别 ™ 符号时找到的 utf-8 字符对应的转义八进制值。
如果您使用以下代码段:
b = b"242"
s = b.decode('utf8')
print(s)
你会很高兴看到它打印 ™。