为什么 Google PDF DOCUMENT_TEXT_DETECTION API 比 Google JPG DOCUMENT_TEXT_DETECTION API 慢很多

Why is Google PDF DOCUMENT_TEXT_DETECTION API much slower than Google JPG DOCUMENT_TEXT_DETECTION API

我注意到 Google Vision PDF OCR DOCUMENT_TEXT_DETECTION 需要大约 15 秒来检测单个 PDF 页面 https://cloud.google.com/vision/docs/pdf。
但是，如果我提交与 JPG 相同的 PDF 页面，检测文本所需的时间不到 3 秒 https://cloud.google.com/vision/docs/detecting-fulltext

我使用了此处提供的代码 (C#)https://cloud.google.com/vision/docs/pdf#vision-pdf-detection-gcs-csharp

我注意到以下代码行需要大约 15 秒才能检测到 PDF 中的所有文本并将其保存到 gsBucket operation.PollUntilCompleted();

我的 GsBucket 是 "Multi-Regional Storage" 美国
我也从美国位置上传

我想知道我还能做些什么来加快这个过程，或者这是预期的？

您可能会在Google Groups thread中找到您的查询的答案。总结：

The offline batch API is not designed to take short running time as the first priority. Instead, it aims to provide scheduling for a large number of multi-page PDF/TIFF files according to quota limits. So instead of sending PDF/TIFF files one by one and wait for each one to succeed, the typical way to use it is to send as many PDF/TIFF files as possible at one time or continuously, track each operation id to get the final result of each PDF/TIFF processing.

C# 客户端库中似乎还没有小批量在线处理feature mentioned in the comments。解决方法是直接调用 REST API 或使用不同语言的客户端库。

为什么 Google PDF DOCUMENT_TEXT_DETECTION API 比 Google JPG DOCUMENT_TEXT_DETECTION API 慢很多

Why is Google PDF DOCUMENT_TEXT_DETECTION API much slower than Google JPG DOCUMENT_TEXT_DETECTION API

c#

asp.net

google-api

google-cloud-platform

google-vision