Azure OCR [打印文本] 未按正确顺序读取收据行
Azure OCR [printed text] is not reading the receipt lines in the right order
应用程序目标:读取收据图像,提取 store/organization 姓名以及支付的总金额。将其提供给网络表单以自动填写和提交。
Post 请求 - "https://*.cognitiveservices.azure.com/vision/v2.0/recognizeText?{params}
获取请求 - https://*.cognitiveservices.azure.com/vision/v2.0/textOperations/{operationId}
然而,当我得到结果时,有时行顺序会令人困惑(见下图 [JSON 响应中的相似结果])
这种混合导致总额为 0.88 美元
9 份测试收据中有 2 份存在类似情况。
问:为什么它适用于类似和不同的结构化收据,但出于某种原因并非对所有收据都一致?另外,任何想法如何解决它?
我快速查看了你的案例。
OCR 结果
如您所述,结果与您想象的不同。我快速查看了边界框值,但我不知道它们是如何排序的。您可以尝试以此为基础合并字段,但已经有一项服务可以为您完成此操作。
表单识别器:
使用表单识别器和您的图片,我得到了以下收据结果。
如下所示,understandingResults
包含 total
及其值(“值”:9.11),MerchantName
(“Chick-fil-a”)和其他领域。
{
"status": "Succeeded",
"recognitionResults": [
{
"page": 1,
"clockwiseOrientation": 0.17,
"width": 404,
"height": 1226,
"unit": "pixel",
"lines": [
{
"boundingBox": [
108,
55,
297,
56,
296,
71,
107,
70
],
"text": "Welcome to Chick-fil-a",
"words": [
{
"boundingBox": [
108,
56,
169,
56,
169,
71,
108,
71
],
"text": "Welcome",
"confidence": "Low"
},
{
"boundingBox": [
177,
56,
194,
56,
194,
71,
177,
71
],
"text": "to"
},
{
"boundingBox": [
201,
56,
296,
57,
296,
71,
201,
71
],
"text": "Chick-fil-a"
}
]
},
...
OTHER LINES CUT FOR DISPLAY
...
]
}
],
"understandingResults": [
{
"pages": [
1
],
"fields": {
"Subtotal": null,
"Total": {
"valueType": "numberValue",
"value": 9.11,
"text": ".11",
"elements": [
{
"$ref": "#/recognitionResults/0/lines/32/words/0"
},
{
"$ref": "#/recognitionResults/0/lines/32/words/1"
}
]
},
"Tax": {
"valueType": "numberValue",
"value": 0.88,
"text": "[=10=].88",
"elements": [
{
"$ref": "#/recognitionResults/0/lines/31/words/0"
},
{
"$ref": "#/recognitionResults/0/lines/31/words/1"
},
{
"$ref": "#/recognitionResults/0/lines/31/words/2"
}
]
},
"MerchantAddress": null,
"MerchantName": {
"valueType": "stringValue",
"value": "Chick-fil-a",
"text": "Chick-fil-a",
"elements": [
{
"$ref": "#/recognitionResults/0/lines/0/words/2"
}
]
},
"MerchantPhoneNumber": {
"valueType": "stringValue",
"value": "+13092689500",
"text": "309-268-9500",
"elements": [
{
"$ref": "#/recognitionResults/0/lines/4/words/0"
}
]
},
"TransactionDate": {
"valueType": "stringValue",
"value": "2019-06-21",
"text": "6/21/2019",
"elements": [
{
"$ref": "#/recognitionResults/0/lines/6/words/0"
}
]
},
"TransactionTime": {
"valueType": "stringValue",
"value": "13:00:57",
"text": "1:00:57 PM",
"elements": [
{
"$ref": "#/recognitionResults/0/lines/6/words/1"
},
{
"$ref": "#/recognitionResults/0/lines/6/words/2"
}
]
}
}
}
]
}
有关表单识别器的更多详细信息:https://azure.microsoft.com/en-us/services/cognitive-services/form-recognizer/
应用程序目标:读取收据图像,提取 store/organization 姓名以及支付的总金额。将其提供给网络表单以自动填写和提交。
Post 请求 - "https://*.cognitiveservices.azure.com/vision/v2.0/recognizeText?{params}
获取请求 - https://*.cognitiveservices.azure.com/vision/v2.0/textOperations/{operationId}
然而,当我得到结果时,有时行顺序会令人困惑(见下图 [JSON 响应中的相似结果])
这种混合导致总额为 0.88 美元
9 份测试收据中有 2 份存在类似情况。
问:为什么它适用于类似和不同的结构化收据,但出于某种原因并非对所有收据都一致?另外,任何想法如何解决它?
我快速查看了你的案例。
OCR 结果
如您所述,结果与您想象的不同。我快速查看了边界框值,但我不知道它们是如何排序的。您可以尝试以此为基础合并字段,但已经有一项服务可以为您完成此操作。
表单识别器:
使用表单识别器和您的图片,我得到了以下收据结果。
如下所示,understandingResults
包含 total
及其值(“值”:9.11),MerchantName
(“Chick-fil-a”)和其他领域。
{
"status": "Succeeded",
"recognitionResults": [
{
"page": 1,
"clockwiseOrientation": 0.17,
"width": 404,
"height": 1226,
"unit": "pixel",
"lines": [
{
"boundingBox": [
108,
55,
297,
56,
296,
71,
107,
70
],
"text": "Welcome to Chick-fil-a",
"words": [
{
"boundingBox": [
108,
56,
169,
56,
169,
71,
108,
71
],
"text": "Welcome",
"confidence": "Low"
},
{
"boundingBox": [
177,
56,
194,
56,
194,
71,
177,
71
],
"text": "to"
},
{
"boundingBox": [
201,
56,
296,
57,
296,
71,
201,
71
],
"text": "Chick-fil-a"
}
]
},
...
OTHER LINES CUT FOR DISPLAY
...
]
}
],
"understandingResults": [
{
"pages": [
1
],
"fields": {
"Subtotal": null,
"Total": {
"valueType": "numberValue",
"value": 9.11,
"text": ".11",
"elements": [
{
"$ref": "#/recognitionResults/0/lines/32/words/0"
},
{
"$ref": "#/recognitionResults/0/lines/32/words/1"
}
]
},
"Tax": {
"valueType": "numberValue",
"value": 0.88,
"text": "[=10=].88",
"elements": [
{
"$ref": "#/recognitionResults/0/lines/31/words/0"
},
{
"$ref": "#/recognitionResults/0/lines/31/words/1"
},
{
"$ref": "#/recognitionResults/0/lines/31/words/2"
}
]
},
"MerchantAddress": null,
"MerchantName": {
"valueType": "stringValue",
"value": "Chick-fil-a",
"text": "Chick-fil-a",
"elements": [
{
"$ref": "#/recognitionResults/0/lines/0/words/2"
}
]
},
"MerchantPhoneNumber": {
"valueType": "stringValue",
"value": "+13092689500",
"text": "309-268-9500",
"elements": [
{
"$ref": "#/recognitionResults/0/lines/4/words/0"
}
]
},
"TransactionDate": {
"valueType": "stringValue",
"value": "2019-06-21",
"text": "6/21/2019",
"elements": [
{
"$ref": "#/recognitionResults/0/lines/6/words/0"
}
]
},
"TransactionTime": {
"valueType": "stringValue",
"value": "13:00:57",
"text": "1:00:57 PM",
"elements": [
{
"$ref": "#/recognitionResults/0/lines/6/words/1"
},
{
"$ref": "#/recognitionResults/0/lines/6/words/2"
}
]
}
}
}
]
}
有关表单识别器的更多详细信息:https://azure.microsoft.com/en-us/services/cognitive-services/form-recognizer/