Google Document AI c# mime 不支持的输入文件格式
Google Document AI c# mime Unsupported input file format
我正在尝试上传 pdf 以供处理到 google 的 Document AI 服务。使用 google 的使用 Google.Cloud.DocumentAI.V1 作为“C#”。查看了 github 和文档,信息不多。 PDF 在本地驱动器上。我将 pdf 转换为字节数组,然后将其转换为 Bystring。然后将请求 mime 设置为“application/pdf”,但它 return 是一个错误:
Status(StatusCode="InvalidArgument", Detail="不支持的输入文件格式。", DebugException="Grpc.Core.Internal.CoreErrorDetailException: {"created":"@1627582435.256000000","description":"Error received from同行 ipv4:142.250.72.170:443","文件":"......\src\core\lib\surface\call.cc","file_line":1067,"grpc_message ":"不支持的输入文件格式。","grpc_status":3}")
代码:
try
{
//Generate a document
string pdfFilePath = "C:\Users\maponte\Documents\Projects\SettonProjects\OCRSTUFF\DOC071621-0016.pdf";
var bytes = Encoding.UTF8.GetBytes(pdfFilePath);
ByteString content = ByteString.CopyFrom(bytes);
// Create client
DocumentProcessorServiceClient documentProcessorServiceClient = await DocumentProcessorServiceClient.CreateAsync();
// Initialize request argument(s)
ProcessRequest request = new ProcessRequest
{
ProcessorName = ProcessorName.FromProjectLocationProcessor("*****", "mycountry", "***"),
SkipHumanReview = false,
InlineDocument = new Document(),
RawDocument = new RawDocument(),
};
request.RawDocument.MimeType = "application/pdf";
request.RawDocument.Content = content;
// Make the request
ProcessResponse response = await documentProcessorServiceClient.ProcessDocumentAsync(request);
Document docResponse = response.Document;
Console.WriteLine(docResponse.Text);
}
catch(Exception ex)
{
Console.WriteLine(ex.Message);
}
这就是问题所在(或至少有一个问题)- 您实际上并未加载文件:
string pdfFilePath = "C:\Users\maponte\Documents\Projects\SettonProjects\OCRSTUFF\DOC071621-0016.pdf";
var bytes = Encoding.UTF8.GetBytes(pdfFilePath);
ByteString content = ByteString.CopyFrom(bytes);
你反而想要:
string pdfFilePath = "path-as-before";
var bytes = File.ReadAllBytes(pdfFilePath);
ByteString content = ByteString.CopyFrom(bytes);
不过,我还要指出,InlineDocument
和 RawDocument
是彼此的 替代项 - 指定其中一个会删除另一个。您的请求创建最好写成:
ProcessRequest request = new ProcessRequest
{
ProcessorName = ProcessorName.FromProjectLocationProcessor("*****", "mycountry", "***"),
SkipHumanReview = false,
RawDocument = new RawDocument
{
MimeType = "application/pdf",
Content = content
}
};
我正在尝试上传 pdf 以供处理到 google 的 Document AI 服务。使用 google 的使用 Google.Cloud.DocumentAI.V1 作为“C#”。查看了 github 和文档,信息不多。 PDF 在本地驱动器上。我将 pdf 转换为字节数组,然后将其转换为 Bystring。然后将请求 mime 设置为“application/pdf”,但它 return 是一个错误:
Status(StatusCode="InvalidArgument", Detail="不支持的输入文件格式。", DebugException="Grpc.Core.Internal.CoreErrorDetailException: {"created":"@1627582435.256000000","description":"Error received from同行 ipv4:142.250.72.170:443","文件":"......\src\core\lib\surface\call.cc","file_line":1067,"grpc_message ":"不支持的输入文件格式。","grpc_status":3}")
代码:
try
{
//Generate a document
string pdfFilePath = "C:\Users\maponte\Documents\Projects\SettonProjects\OCRSTUFF\DOC071621-0016.pdf";
var bytes = Encoding.UTF8.GetBytes(pdfFilePath);
ByteString content = ByteString.CopyFrom(bytes);
// Create client
DocumentProcessorServiceClient documentProcessorServiceClient = await DocumentProcessorServiceClient.CreateAsync();
// Initialize request argument(s)
ProcessRequest request = new ProcessRequest
{
ProcessorName = ProcessorName.FromProjectLocationProcessor("*****", "mycountry", "***"),
SkipHumanReview = false,
InlineDocument = new Document(),
RawDocument = new RawDocument(),
};
request.RawDocument.MimeType = "application/pdf";
request.RawDocument.Content = content;
// Make the request
ProcessResponse response = await documentProcessorServiceClient.ProcessDocumentAsync(request);
Document docResponse = response.Document;
Console.WriteLine(docResponse.Text);
}
catch(Exception ex)
{
Console.WriteLine(ex.Message);
}
这就是问题所在(或至少有一个问题)- 您实际上并未加载文件:
string pdfFilePath = "C:\Users\maponte\Documents\Projects\SettonProjects\OCRSTUFF\DOC071621-0016.pdf";
var bytes = Encoding.UTF8.GetBytes(pdfFilePath);
ByteString content = ByteString.CopyFrom(bytes);
你反而想要:
string pdfFilePath = "path-as-before";
var bytes = File.ReadAllBytes(pdfFilePath);
ByteString content = ByteString.CopyFrom(bytes);
不过,我还要指出,InlineDocument
和 RawDocument
是彼此的 替代项 - 指定其中一个会删除另一个。您的请求创建最好写成:
ProcessRequest request = new ProcessRequest
{
ProcessorName = ProcessorName.FromProjectLocationProcessor("*****", "mycountry", "***"),
SkipHumanReview = false,
RawDocument = new RawDocument
{
MimeType = "application/pdf",
Content = content
}
};