Google Document AI c# mime 不支持的输入文件格式

Google Document AI c# mime Unsupported input file format

我正在尝试上传 pdf 以供处理到 google 的 Document AI 服务。使用 google 的使用 Google.Cloud.DocumentAI.V1 作为“C#”。查看了 github 和文档,信息不多。 PDF 在本地驱动器上。我将 pdf 转换为字节数组,然后将其转换为 Bystring。然后将请求 mime 设置为“application/pdf”,但它 return 是一个错误:

Status(StatusCode="InvalidArgument", Detail="不支持的输入文件格式。", DebugException="Grpc.Core.Internal.CoreErrorDetailException: {"created":"@1627582435.256000000","description":"Error received from同行 ipv4:142.250.72.170:443","文件":"......\src\core\lib\surface\call.cc","file_line":1067,"grpc_message ":"不支持的输入文件格式。","grpc_status":3}")

代码:

try
{
    //Generate a document
    string pdfFilePath = "C:\Users\maponte\Documents\Projects\SettonProjects\OCRSTUFF\DOC071621-0016.pdf";
    var bytes = Encoding.UTF8.GetBytes(pdfFilePath);


    ByteString content = ByteString.CopyFrom(bytes);

    // Create client
    DocumentProcessorServiceClient documentProcessorServiceClient = await DocumentProcessorServiceClient.CreateAsync();
    // Initialize request argument(s)
    ProcessRequest request = new ProcessRequest
    {
        ProcessorName = ProcessorName.FromProjectLocationProcessor("*****", "mycountry", "***"),
        SkipHumanReview = false,
        InlineDocument = new Document(),
        RawDocument = new RawDocument(),
    };
    
    request.RawDocument.MimeType = "application/pdf";
    request.RawDocument.Content = content;

    // Make the request
    ProcessResponse response = await documentProcessorServiceClient.ProcessDocumentAsync(request);

    Document docResponse = response.Document;

    Console.WriteLine(docResponse.Text);
   
}
catch(Exception ex)
{
    Console.WriteLine(ex.Message);
}

这就是问题所在(或至少有一个问题)- 您实际上并未加载文件:

string pdfFilePath = "C:\Users\maponte\Documents\Projects\SettonProjects\OCRSTUFF\DOC071621-0016.pdf";
var bytes = Encoding.UTF8.GetBytes(pdfFilePath);

ByteString content = ByteString.CopyFrom(bytes);

你反而想要:

string pdfFilePath = "path-as-before";
var bytes = File.ReadAllBytes(pdfFilePath);
ByteString content = ByteString.CopyFrom(bytes);

不过,我还要指出,InlineDocumentRawDocument 是彼此的 替代项 - 指定其中一个会删除另一个。您的请求创建最好写成:

ProcessRequest request = new ProcessRequest
{
    ProcessorName = ProcessorName.FromProjectLocationProcessor("*****", "mycountry", "***"),
    SkipHumanReview = false,
    RawDocument = new RawDocument
    {
        MimeType = "application/pdf",
        Content = content
    }
};