如何只输出整个段落 [Google Cloud Vision API, document_text_detection]

Question

我尝试 Google Cloud Vision API 的 document_text_detection。它在日语中非常有效，但我有一个问题。响应包含整个段落和带有换行符的部分段落。我只需要整个段落。

这是回复。

Google keep の画像 テキスト化
画像文字認識で手書き文字をどこ
までテキスト化が出来るのかをテスト。
Google keep OCR機能がとれた
け使えるかを確認
この手書き文書を認献してiPhone
のGoogle keepでテキスト化して
Macで編集をするのにどれだけ
出来るかも確認する。

Google
keep
の画像
テキスト化
画像文字認識で手書き文字をどこ
までテキスト化が出来るのかをテスト
。
Google
keep
OCR機能がとれた
け使えるかを確認
この手書き文書を認献してiPhone
のGoogle
keepでテキスト化して
Macで編集をするのにどれだけ
出来るかも確認する
。

这是我的 python 代码。

import io
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="credentials.json"
"""Detects text in the file."""
from google.cloud import vision
client = vision.ImageAnnotatorClient()
directory = 'resources/'
files = os.listdir(directory)

for i in files:
    with io.open(directory+i, 'rb') as image_file:
        content = image_file.read()

    image = vision.types.Image(content=content)

    response = client.document_text_detection(image=image)
    texts = response.text_annotations

    for text in texts:
        print('{}'.format(text.description))

我阅读了 API 参考资料 (https://cloud.google.com/vision/docs/reference/rest/v1/AnnotateImageResponse#TextAnnotation) 并想到了使用 response.full_text_annotation 而不是 response.text_annotations。

image = vision.types.Image(content=content)
response = client.document_text_detection(image=image)
texts = response.full_text_annotation
print('{}'.format(text))

但是我收到一条错误消息。

File "/home/kazu/language/ocr.py", line 21, in <module> print('{}'.format(text))
NameError: name 'text' is not defined

你能给我一些信息或建议吗？

提前谢谢你。

此致 Kazu

Answer 1

看起来像是打字错误。您将变量命名为 "texts"，但尝试使用变量 "text".

如何只输出整个段落 [Google Cloud Vision API, document_text_detection]

How to output only the whole passage [Google Cloud Vision API, document_text_detection]

python

json

google-api

google-cloud-platform

google-cloud-vision