处理来自 GCS 的文档时,文档 AI 处理文档因参数无效而失败
Document AI process document fails with invalid argument when processing docs from GCS
我收到与以下非常相似的错误,但我不在欧盟:
当我使用 raw_document 处理本地 pdf 文件时,它工作正常。但是,当我在 GCS 位置指定 pdf 文件时,它失败了。
错误信息:
the processor name: projects/xxxxxxxxx/locations/us/processors/f7502cad4bccdd97
the form process request: name: "projects/xxxxxxxxx/locations/us/processors/f7502cad4bccdd97"
inline_document {
uri: "gs://xxxx/temp/test1.pdf"
}
Traceback (most recent call last):
File "C:\Python39\lib\site-packages\google\api_core\grpc_helpers.py", line 66, in error_remapped_callable
return callable_(*args, **kwargs)
File "C:\Python39\lib\site-packages\grpc\_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "C:\Python39\lib\site-packages\grpc\_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "Request contains an invalid argument."
debug_error_string = "{"created":"@1647296055.582000000","description":"Error received from peer ipv4:142.250.80.74:443","file":"src/core/lib/surface/call.cc","file_line":1070,"grpc_message":"Request contains an invalid argument.","grpc_status":3}"
>
代码:
client = documentai.DocumentProcessorServiceClient(client_options=opts)
# The full resource name of the processor, e.g.:
# projects/project-id/locations/location/processor/processor-id
# You must create new processors in the Cloud Console first
name = f"projects/{project_id}/locations/{location}/processors/{processor_id}"
print(f'the processor name: {name}')
# document = {"uri": gcs_path, "mime_type": "application/pdf"}
document = {"uri": gcs_path}
inline_document = documentai.Document()
inline_document.uri = gcs_path
# inline_document.mime_type = "application/pdf"
# Configure the process request
# request = {"name": name, "inline_document": document}
request = documentai.ProcessRequest(
inline_document=inline_document,
name=name
)
print(f'the form process request: {request}')
result = client.process_document(request=request)
我认为我在存储桶上没有权限问题,因为相同的设置适用于同一存储桶上的文档分类过程。
这是 Document AI 的已知问题,已在 issue tracker 中报告。不幸的是,目前唯一的解决方法是:
- 下载您的文件,以字节形式读取文件并使用 process_documents(). See Document AI local processing 作为示例代码。
- 使用 batch_process_documents() 因为默认情况下只接受来自 GCS 的文件。如果您不想在下载文件时执行额外的步骤。
我收到与以下非常相似的错误,但我不在欧盟:
当我使用 raw_document 处理本地 pdf 文件时,它工作正常。但是,当我在 GCS 位置指定 pdf 文件时,它失败了。
错误信息:
the processor name: projects/xxxxxxxxx/locations/us/processors/f7502cad4bccdd97
the form process request: name: "projects/xxxxxxxxx/locations/us/processors/f7502cad4bccdd97"
inline_document {
uri: "gs://xxxx/temp/test1.pdf"
}
Traceback (most recent call last):
File "C:\Python39\lib\site-packages\google\api_core\grpc_helpers.py", line 66, in error_remapped_callable
return callable_(*args, **kwargs)
File "C:\Python39\lib\site-packages\grpc\_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "C:\Python39\lib\site-packages\grpc\_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "Request contains an invalid argument."
debug_error_string = "{"created":"@1647296055.582000000","description":"Error received from peer ipv4:142.250.80.74:443","file":"src/core/lib/surface/call.cc","file_line":1070,"grpc_message":"Request contains an invalid argument.","grpc_status":3}"
>
代码:
client = documentai.DocumentProcessorServiceClient(client_options=opts)
# The full resource name of the processor, e.g.:
# projects/project-id/locations/location/processor/processor-id
# You must create new processors in the Cloud Console first
name = f"projects/{project_id}/locations/{location}/processors/{processor_id}"
print(f'the processor name: {name}')
# document = {"uri": gcs_path, "mime_type": "application/pdf"}
document = {"uri": gcs_path}
inline_document = documentai.Document()
inline_document.uri = gcs_path
# inline_document.mime_type = "application/pdf"
# Configure the process request
# request = {"name": name, "inline_document": document}
request = documentai.ProcessRequest(
inline_document=inline_document,
name=name
)
print(f'the form process request: {request}')
result = client.process_document(request=request)
我认为我在存储桶上没有权限问题,因为相同的设置适用于同一存储桶上的文档分类过程。
这是 Document AI 的已知问题,已在 issue tracker 中报告。不幸的是,目前唯一的解决方法是:
- 下载您的文件,以字节形式读取文件并使用 process_documents(). See Document AI local processing 作为示例代码。
- 使用 batch_process_documents() 因为默认情况下只接受来自 GCS 的文件。如果您不想在下载文件时执行额外的步骤。