如何使用 a2i 的 crowd-textract-analyze-document 突出显示自定义提取？

Question

我想为使用 Amazon Textract 进行 OCR 和使用 Amazon Comprehend 进行实体提取的图像创建人工审核循环。

我的流程是：

将图像发送到 Textract 以提取文本
将文本发送到 Comprehend 以提取实体
在 Comprehend 提取的实体的 Textract 输出中找到块 ID
将 KEY_VALUE_SET 类型的新块添加到 textract 的 JSON 输出 per the docs
使用模板中的 crowd-textract-analyze-document 元素创建人工任务，并为其提供修改后的 textract 输出

在这个过程中失败的是第 5 步。我的自定义实体没有正确呈现。 “无法工作”是指当我在侧边栏上单击实体时，实体没有在图像上突出显示。浏览器控制台没有报错。

有人试过这样的东西吗？

抱歉没有包含示例。我将从我的文件中删除 secrets/PII 并将它们附加到问题

Answer 1

我使用 a2i-crowd-textract-detection human task element 的 AWS 文档来生成 initialValue 属性的值。该属性的文档似乎不正确。虽然文档显示该值的格式应与 Textract 的输出格式相同，即：

[
        {
            "BlockType": "KEY_VALUE_SET",
            "Confidence": 38.43309020996094,
            "Geometry": { ... }
            "Id": "8c97b240-0969-4678-834a-646c95da9cf4",
            "Relationships": [
                { "Type": "CHILD", "Ids": [...]},
                { "Type": "VALUE", "Ids": [...]}
            ],
            "EntityTypes": ["KEY"],
            "Text": "Foo bar"
        },
]

a2i-crowd-textract-detection 期望输入具有 lowerCamelCase 属性名称（而不是 UpperCamelCase）。例如：

[
        {
            "blockType": "KEY_VALUE_SET",
            "confidence": 38.43309020996094,
            "geometry": { ... }
            "id": "8c97b240-0969-4678-834a-646c95da9cf4",
            "relationships": [
                { "Type": "CHILD", "ids": [...]},
                { "Type": "VALUE", "ids": [...]}
            ],
            "entityTypes": ["KEY"],
            "text": "Foo bar"
        },
]

我向 AWS 提交了关于此文档错误的支持案例。

如何使用 a2i 的 crowd-textract-analyze-document 突出显示自定义提取？

How to highlight custom extractions using a2i's crowd-textract-analyze-document?

amazon-web-services

amazon-sagemaker

amazon-textract

amazon-comprehend