使用 Google Cloud Vision 检测 PDF/TIFF 文件中的文本

Question

我想用 Google Cloud Vision 检测 PDF 和 TIFF 文件中的文本，但从外观上看，只有先将文件存储到 Google Cloud Storage 才能完成.不存储在云端也能做到吗？

Answer 1

如果您有图像（还没有尝试过 PDF，您可能需要将其覆盖到图像中），您可以将其转换为 base 64 并发送..

一些代码片段：

// 主要缩写代码 -----

var cloudVisionUrl = $"{annotationTextApiUrl}{annotationTextApiKey}";

            var imageBase64 = DoYourOwnImageToBase64(path);

            var client = new HttpClient();

            var requests = new ApiRequest { Requests = new List<Request> { new Request { Image = new Image {Content = imageBase64}, Features = new List<Feature> {new Feature {Type = "TEXT_DETECTION"}} } } };

            var httpResponse = await client.PostAsJsonAsync(cloudVisionUrl, requests);

// ---------------------------------------

public class ApiRequest
    {
        public ApiRequest()
        {
            Requests = new List<Request>();
        }

        [JsonProperty("requests")]
        public List<Request> Requests { get; set; }
    }

public class Request
    {
        [JsonProperty("image")]
        public Image Image { get; set; }

        [JsonProperty("features")]
        public List<Feature> Features { get; set; }
    }

public class Feature
    {
        [JsonProperty("type")]
        public string Type { get; set; }
    }

Answer 2

目前，您需要将内容存储在 Google 云存储桶中。但是，有一个 feature request to read PDF files without having to be stored in the bucket. I suggest starring 这个问题并发表评论以表明这可以帮助您解决当前的情况。

Answer 3

现在是possible。只需将您的文件转换为 base64 并将其放入您的 inputConfig 的内容中。支持的格式有 PDF、gif 和 tiff。

使用 Google Cloud Vision 检测 PDF/TIFF 文件中的文本

Detect Text in PDF/TIFF Files with Google Cloud Vision

c#

google-cloud-vision