google 云视觉 api,如何阅读文本并对其进行结构化

google cloud vision api, how to read text and structure it

我正在使用 google 云视觉 api python 扫描文档以读取其中的文本。文档是包含客户详细信息和表格的发票。文档到文本数据的转换非常完美。但是数据没有排序。我无法找到如何对数据进行排序的方法,因为我需要从中提取一些值。而且我要提取的数据有时位于不同的位置,这让我很难提取。

https://cloud.google.com/vision/docs/fulltext-annotations

这是我的 python 代码:

import io
import os
from google.cloud import vision
from google.cloud.vision import types
import glob


def scan_img(image_file):
    with io.open(image_file, 'rb') as image_file:
        content = image_file.read()

    image = types.Image(content=content)

    response = client.document_text_detection(image=image)
    document = response.full_text_annotation
    img_out_array = document.text.split("\n")
    invoice_no_raw = ""
    invoice_date_raw = ""
    net_total_idx = ""
    customer_name_index = ""

    for index, line in enumerate(img_out_array):
        if "Invoice No" in line:
            invoice_no_raw = line
        if "Customer Name" in line:
            index += 6
            customer_name_index = index
        if "Date :" in line:
            invoice_date_raw = line
        if "Our Bank details" in line:
            index -= 1
            net_total_idx = index

    net_total_sales_raw = img_out_array[net_total_idx]
    customer_name_raw = img_out_array[customer_name_index]
    print("Raw data:: ", invoice_no_raw, invoice_date_raw, customer_name_raw, img_out_array[net_total_idx])

    invoice_no = invoice_no_raw.split(":")[1]
    invoice_date = invoice_date_raw.split(":")[1]
    customer_name = customer_name_raw.replace("..", "")
    net_total_sales = net_total_sales_raw.split(" ")[-1]

    return [invoice_no, invoice_date, customer_name, net_total_sales]


if __name__ == '__main__':
    os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = 
    "path/to/imgtotext.json"
    client = vision.ImageAnnotatorClient()
    images = glob.glob("/path/Documents/invoices/*.jpg")
    for image in images:
        print("scanning the image:::::" + image)
        invoice_no, invoice_date, customer_name, net_total_sales = 
        scan_img(image)
        print("Formatted data:: ", invoice_no, invoice_date, 
        customer_name, net_total_sales)

文档 1 输出:

Customer Name
Address
**x customer**
area name
streetname
Customer LPO

文档 2 输出:

Customer LPO
**y customer**
area name
streetname
LPO Date
Payment Terms
Customer Name
Address
Delivery Location

请指教,我想阅读 x 和 y 客户,这个位置随着文档的不同而变化,我有几个文档。如何构建它并读取数据。

There are other several fields which I'm able successfully read it.

提前致谢。

Cloud Vision API 没有特定请求 属性 来指定用于读取或排序文件数据的格式。相反,我认为可用的解决方法是使用 BoundingPoly and Vertex response properties, that display the coordinates related to each word contained in the image, in order to process the vertices data within your code logic and define the text that need to be grouped by columns and rows. You can take a look on this link,其中包括一些包含这些属性的响应示例。

如果此功能无法满足您当前的需求,您可以使用 发送反馈 按钮,该按钮位于 service public documentation, as well as take a look the Issue Tracker tool in order to raise a Vision API feature request 并通知 Google 有关此所需功能的信息。