使用 google 视觉 OCR API 从特定图像位置提取数据

Question

我正在使用 Googles Vision OCR API 尝试从图像中提取两种类型的数据 1) 来自文本框的手写文本；下面标有红色圆圈和 2) 复选框或 'x'；下面用绿色圆圈标出。我将把这些数据输入数据库，所以我需要一个字符串 returned 用于两种类型的数据

目前，当我将此图像传递到 API 时，我得到一个包含所有数据的字符串：

Secondary School Study Student Perception of Computers LO 13 . Are any of your family members working >in computing / IT ? If so , what family member ( s ) is it ( eg , parent , guardian , brother , sister >etc . ) brother 14 . Have you any previous computing experience ( even attended a single day ) ? Select >one or many areas : U CODER DOJO IN SCHOOL CAMP VSELF TAUGHT JOTHER If you selected any from Q14 , was >the general experience : GOOD NEITHER GOOD OR BAD BAD BAD And why ( short answer , under 4 words ) >learned new skills To be completed after the camp . NewsLRY 1 . I would now consider a career in >computing / IT . Strongly Agree Agree No Opinion Disagree Strongly Disagree 2 . The camp showed me what >a career in computing / IT really was . ? Strongly Agree Agree No Opinion Disagree Strongly Disagree 3 >. The camp showed / highlighted that I was no good at programming or computing . Strongly Agree Agree >No Opinion Disagree Strongly Disagree 4 . Give two things that you did not know about computing / >programming until after the camp ? java Language Eclipse IDE va 5 . I was better than I first thought ( >before the camp ) at programming / computing . ? Agree No Opinion Disagree Strongly Disagree ? O >Strongly Agree 6 . Any feedback / comments about the camp ( good or bad ) ? good camp , Learned a lot . >Thank you for taking this survey . Page 2 of 2

我的代码：

 public static void Main(string[] args)
        {

            string credential_path = @"C:\Users385\nodal.json";
            System.Environment.SetEnvironmentVariable("GOOGLE_APPLICATION_CREDENTIALS", credential_path);

            // Instantiates a client
            var client = ImageAnnotatorClient.Create();
            // Load the image file into memory
            var image = Image.FromFile("stack.jpg");
            // Performs text detection on the image file
            var response = client.DetectDocumentText(image);

            string words = "";

            foreach (var page in response.Pages)
            {
                foreach (var block in page.Blocks)
                {
                    string box = string.Join(" - ", block.BoundingBox.Vertices.Select(v => $"({v.X}, {v.Y})"));
                    foreach (var paragraph in block.Paragraphs)
                    {
                        box = string.Join(" - ", paragraph.BoundingBox.Vertices.Select(v => $"({v.X}, {v.Y})"));
                        foreach (var word in paragraph.Words)
                        {
                            words += $" {string.Join("", word.Symbols.Select(s => s.Text))}";
                        }
                    }
                }
            }

            Console.WriteLine(words);


        }

所以我的问题：

如何从每个红色框中提取数据（即第一个文本框将 return 'brother'，第二个文本框应 return 'learned new skills'）？
如何从每个绿色问题中提取标记了哪个复选框（即问题 13 应该 return 'YES'、问题 14. 应该 return 'SELF TAUGHT' 等.)?

Answer 1

我只是使用了某些 PHP 脚本中的 API，但我认为您的问题不取决于编程语言。您需要使用检测到的单词的坐标（准确地说是具有四个顶点的框）。然后，您可以找到与参与者的写作相关的问卷元素。这个脚本对我来说是一个很好的切入点：

https://www.leanx.eu/tutorials/use-google-cloud-vision-api-to-process-invoices-and-receipts

您可以在任何启用 PHP 的网站空间上使用它 "as is"，它为您提供了结构良好的概述，说明如何检索 API returns.

有了这些方框并知道问卷的文本，如果 google 检测到您的参与者所做的复选标记，应该很容易找到它们。复选标记的检测可能并不总是适用于 google 视觉，因为 google 的 OCR 并不总是能找到单个 "character"。

使用 google 视觉 OCR API 从特定图像位置提取数据

extracting data from specific image locations using google vision OCR API

c#

api

ocr

google-api

google-vision