按逻辑顺序/阅读方向显示 Azure OCR 输出

Question

我有一个 Azure OCR 输出作为此 JSON 读取脚本（Microsoft 模板代码）的结果：

# Extract the word bounding boxes and text.
line_infos = [region["lines"] for region in analysis["regions"]]
word_infos = []
for line in line_infos:
    for word_metadata in line:
        for word_info in word_metadata["words"]:
            word_infos.append(word_info)
word_infos

输出：

{'boundingBox': '183,73,624,102', 'text': 'This'},
{'boundingBox': '851,100,160,67', 'text': 'person'},
{'boundingBox': '1052,109,448,97', 'text': 'plays.'},
...

问题：这三个词原本属于扫描文档的一行，但在 Azure OCR 输出中具有不同的边界框。我可以在 OCR 服务中调整边界框阈值吗？或者是否有一个简洁的辅助函数来分析边界框坐标以便将最近的邻居对齐在一起？

请求的输出为：

{'boundingBox': 'xxx,xx,xxx,xxx', 'text': 'This person plays.'}

Answer 1

在您的代码中，您已经获得了线路信息。

line_infos = [region["lines"] for region in analysis["regions"]]

例如，我把这个image用于OCR。

下面是 line_infos

的输出

[{'boundingBox': '28,16,288,41', 'words': [{'boundingBox': '28,16,288,41', 'text': 'NOTHING'}]}, {'boundingBox': '27,66,283,52', 'words': [{'boundingBox': '27,66,283,52', 'text': 'EXISTS'}]}, {'boundingBox': '27,128,292,49', 'words': [{'boundingBox': '27,128,292,49', 'text': 'EXCEPT'}]}, {'boundingBox': '24,188,292,54', 'words': [{'boundingBox': '24,188,292,54', 'text': 'ATOMS'}]}, {'boundingBox': '22,253,297,32', 'words': [{'boundingBox': '22,253,105,32', 'text': 'AND'}, {'boundingBox': '144,253,175,32', 'text': 'EMPTY'}]}, {'boundingBox': '21,298,304,60', 'words': [{'boundingBox': '21,298,304,60', 'text': 'SPACE.'}]}, {'boundingBox': '26,387,294,37', 'words': [{'boundingBox': '26,387,210,37', 'text': 'Everything'}, {'boundingBox': '249,389,71,27', 'text': 'else'}]}, {'boundingBox': '127,431,198,36', 'words': [{'boundingBox': '127,431,31,29', 'text': 'is'}, {'boundingBox': '172,431,153,36', 'text': 'opinion.'}]}]

让我们仔细看看图像中“其他所有内容”的输出，因为它们在同一行中：

{'boundingBox': '26,387,294,37', 'words': [{'boundingBox': '26,387,210,37', 'text': 'Everything'}, {'boundingBox': '249,389,71,27', 'text': 'else'}]}

它们已经在行级别分组，您必须相应地提取它。以下是修改后的代码示例，用于在行级提取它：

line_num = 0 
for line in line_infos:
    for word_metadata in line:
        word_infos = []
        line_num +=1
        for word_info in word_metadata["words"]:
            word_infos.append(word_info["text"])
        print(line_num)
        print (word_infos)

代码段的输出：

按逻辑顺序/阅读方向显示 Azure OCR 输出

Display Azure OCR output in logic order / reading-direction

python

ocr

json

azure