了解 google OCR 全文注释中的 DetectedBreak

Question

我正在尝试将google视觉OCR结果的全文标注转换为Block、Paragraph、Word中的行级和词级和 Symbol 层次结构。

但是，将symbols转换为word文本以及将word转换为line文本时，我需要了解DetectedBreak 属性.

我看完了This documentation。但是我没看懂其中的几个。

谁能解释一下下面的 Breaks 是什么意思？我只听懂了LINE_BREAK和SPACE.

EOL_SURE_SPACE
连字符
LINE_BREAK
SPACE
SURE_SPACE
未知

它们可以用换行符或 space 代替吗？

Answer 1

您提供的 link 提供了最详细的解释，说明了每一项的含义。我想更好地理解的最好方法是运行对不同图像进行 ocr，并将响应与您在相应图像上看到的内容进行比较。以下 python 脚本运行s DOCUMENT_TEXT_DETECTION 在保存在 GCS 中的图像上打印所有检测到的中断，除了那些你不难理解的中断（LINE_BREAK 和 SPACE), 连同紧接在它们前面的词以进行比较。

import sys
import os
from google.cloud import storage
from google.cloud import vision

def detect_breaks(gcs_image):

    os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/path/to/json'
    client = vision.ImageAnnotatorClient()

    feature = vision.types.Feature(
        type=vision.enums.Feature.Type.DOCUMENT_TEXT_DETECTION)

    image_source = vision.types.ImageSource(
        image_uri=gcs_image)

    image = vision.types.Image(
        source=image_source)

    request = vision.types.AnnotateImageRequest(
        features=[feature], image=image)

    annotation = client.annotate_image(request).full_text_annotation

    breaks = vision.enums.TextAnnotation.DetectedBreak.BreakType
    word_text = ""
    for page in annotation.pages:
        for block in page.blocks:
            for paragraph in block.paragraphs:
                for word in paragraph.words:
                    for symbol in word.symbols:
                        word_text += symbol.text
                        if symbol.property.detected_break.type:
                            if symbol.property.detected_break.type == breaks.SPACE or symbol.property.detected_break.type == breaks.LINE_BREAK:
                                word_text = ""
                            else:
                                print word_text,symbol.property.detected_break
                                word_text = ""

if __name__ == '__main__':
    detect_breaks(sys.argv[1])

了解 google OCR 全文注释中的 DetectedBreak

Understanding DetectedBreak in google OCR full text annotations

ocr

text-segmentation

google-vision

google-cloud-vision