使用 google 视觉 OCR API 从特定图像位置提取数据

extracting data from specific image locations using google vision OCR API

我正在使用 Googles Vision OCR API 尝试从图像中提取两种类型的数据 1) 来自文本框的手写文本;下面标有红色圆圈和 2) 复选框或 'x';下面用绿色圆圈标出。我将把这些数据输入数据库,所以我需要一个字符串 returned 用于两种类型的数据

目前,当我将此图像传递到 API 时,我得到一个包含所有数据的字符串:

Secondary School Study Student Perception of Computers LO 13 . Are any of your family members working >in computing / IT ? If so , what family member ( s ) is it ( eg , parent , guardian , brother , sister >etc . ) brother 14 . Have you any previous computing experience ( even attended a single day ) ? Select >one or many areas : U CODER DOJO IN SCHOOL CAMP VSELF TAUGHT JOTHER If you selected any from Q14 , was >the general experience : GOOD NEITHER GOOD OR BAD BAD BAD And why ( short answer , under 4 words ) >learned new skills To be completed after the camp . NewsLRY 1 . I would now consider a career in >computing / IT . Strongly Agree Agree No Opinion Disagree Strongly Disagree 2 . The camp showed me what >a career in computing / IT really was . ? Strongly Agree Agree No Opinion Disagree Strongly Disagree 3 >. The camp showed / highlighted that I was no good at programming or computing . Strongly Agree Agree >No Opinion Disagree Strongly Disagree 4 . Give two things that you did not know about computing / >programming until after the camp ? java Language Eclipse IDE va 5 . I was better than I first thought ( >before the camp ) at programming / computing . ? Agree No Opinion Disagree Strongly Disagree ? O >Strongly Agree 6 . Any feedback / comments about the camp ( good or bad ) ? good camp , Learned a lot . >Thank you for taking this survey . Page 2 of 2

我的代码:

 public static void Main(string[] args)
        {

            string credential_path = @"C:\Users385\nodal.json";
            System.Environment.SetEnvironmentVariable("GOOGLE_APPLICATION_CREDENTIALS", credential_path);

            // Instantiates a client
            var client = ImageAnnotatorClient.Create();
            // Load the image file into memory
            var image = Image.FromFile("stack.jpg");
            // Performs text detection on the image file
            var response = client.DetectDocumentText(image);

            string words = "";

            foreach (var page in response.Pages)
            {
                foreach (var block in page.Blocks)
                {
                    string box = string.Join(" - ", block.BoundingBox.Vertices.Select(v => $"({v.X}, {v.Y})"));
                    foreach (var paragraph in block.Paragraphs)
                    {
                        box = string.Join(" - ", paragraph.BoundingBox.Vertices.Select(v => $"({v.X}, {v.Y})"));
                        foreach (var word in paragraph.Words)
                        {
                            words += $" {string.Join("", word.Symbols.Select(s => s.Text))}";
                        }
                    }
                }
            }

            Console.WriteLine(words);


        }

所以我的问题:

  1. 如何从每个红色框中提取数据(即第一个文本框将 return 'brother',第二个文本框应 return 'learned new skills')?
  2. 如何从每个绿色问题中提取标记了哪个复选框(即问题 13 应该 return 'YES'、问题 14. 应该 return 'SELF TAUGHT' 等.)?

我只是使用了某些 PHP 脚本中的 API,但我认为您的问题不取决于编程语言。 您需要使用检测到的单词的坐标(准确地说是具有四个顶点的框)。然后,您可以找到与参与者的写作相关的问卷元素。 这个脚本对我来说是一个很好的切入点:

https://www.leanx.eu/tutorials/use-google-cloud-vision-api-to-process-invoices-and-receipts

您可以在任何启用 PHP 的网站空间上使用它 "as is",它为您提供了结构良好的概述,说明如何检索 API returns.

有了这些方框并知道问卷的文本,如果 google 检测到您的参与者所做的复选标记,应该很容易找到它们。复选标记的检测可能并不总是适用于 google 视觉,因为 google 的 OCR 并不总是能找到单个 "character"。