过滤掉 AWS Textract 函数返回的数据

Filtering out data returned by AWS Textract function

我已经提取了由 Textract AWS 函数 return 编辑的数据。此 Textract 函数的 return 数据类型为以下类型:

{
   "AnalyzeDocumentModelVersion": "string",
   "Blocks": [ 
      { 
         "BlockType": "string",
         "ColumnIndex": number,
         "ColumnSpan": number,
         "Confidence": number,
         "EntityTypes": [ "string" ],
         "Geometry": { 
            "BoundingBox": { 
               "Height": number,
               "Left": number,
               "Top": number,
               "Width": number
            },
            "Polygon": [ 
               { 
                  "X": number,
                  "Y": number
               }
            ]
         },
         "Id": "string",
         "Page": number,
         "Relationships": [ 
            { 
               "Ids": [ "string" ],
               "Type": "string"
            }
         ],
         "RowIndex": number,
         "RowSpan": number,
         "SelectionStatus": "string",
         "Text": "string"
      }
   ],
   "DocumentMetadata": { 
      "Pages": number
   },
   "JobStatus": "string",
   "NextToken": "string",
   "StatusMessage": "string",
   "Warnings": [ 
      { 
         "ErrorCode": "string",
         "Pages": [ number ]
      }
   ]
}

我已通过以下代码从该数据中提取块:

var d = null;
...<Some Code Here>...
d = data.Blocks;
console.log(d);

以 JSON 对象的数组形式给出输出。下面给出了提取文本的示例:

[...{ BlockType: 'WORD',
    Confidence: 99.7286376953125,
    Text: '2000.00',
    Geometry: { BoundingBox: [Object], Polygon: [Array] },
    Id: '<ID here>',
    Page: 1 }, ...]

我只想提取文本字段并将其视为唯一的输出。我该如何开始呢?

我可能误解了你的问题,但如果你需要提取数据数组中每个对象的文本字段的值,请看下面的例子

const data = [
  {
    BlockType: "WORD",
    Confidence: 99.7286376953125,
    Text: "2000.00",
    Geometry: { BoundingBox: {}, Polygon: [] },
    Id: "<ID here>",
    Page: 1,
  },
];

const output = data.map(({ Text: text }) => text);

console.log(output);