有没有办法在 aws sagemaker 的人工审查自定义实体标签中以其原始结构显示 pdf？

Is there a way to show pdf in its original structure in the human review custom entity labelling in aws sagemaker?

我已修改 this sample 以阅读表格格式的 PDF。在进行人工审核过程时，我想保留原始 pdf 的表格结构。我注意到自定义工作任务模板使用了 crowd-entity-annotation 元素，它似乎只读文本。我知道人工审阅进程从包含由 textract 进程编写的原始文本的 S3 密钥读取。

我一直在考虑使用表格写入 S3，但我认为这不是最佳解决方案。我想保留结构并仍然能够注释自定义实体。

Comprehend 现在原生支持检测 pdf 文档的自定义实体。为此，您可以尝试以下步骤：

按照此 github readme 开始 PDF 文档的注释过程。
注释生成后。您可以使用 Comprehend CreateEntityRecognizer API 为半结构化文档训练自定义实体模型”
训练实体识别器后，您可以使用 StartEntitiesDetectionJob API 对 PDF 文档进行运行推理

有没有办法在 aws sagemaker 的人工审查自定义实体标签中以其原始结构显示 pdf？

Is there a way to show pdf in its original structure in the human review custom entity labelling in aws sagemaker?

amazon-sagemaker

amazon-textract

amazon-comprehend