Python 的 Databricks Azure 表单识别器库失败:无法自动检测内容类型。请传递 content_type 关键字参数
Databricks Azure Form Recognizer library for Python failing: Content type could not be auto-detected. Please pass the content_type keyword argument
我使用 https://docs.microsoft.com/en-us/python/api/overview/azure/ai-formrecognizer-readme?view=azure-python 中的指南来识别 Databricks 的内容。
我使用的代码是
from azure.ai.formrecognizer import FormRecognizerClient
from azure.core.credentials import AzureKeyCredential
endpoint = "https://<region>.api.cognitive.microsoft.com/"
credential = AzureKeyCredential("<api_key>")
form_recognizer_client = FormRecognizerClient(endpoint, credential)
with open("/dbfs/mnt/lake/RAW/export/sentimenttest.txt", "rb") as fd:
form = fd.read()
poller = form_recognizer_client.begin_recognize_content(form)
form_pages = poller.result()
for content in form_pages:
for table in content.tables:
print("Table found on page {}:".format(table.page_number))
print("Table location {}:".format(table.bounding_box))
for cell in table.cells:
print("Cell text: {}".format(cell.text))
print("Location: {}".format(cell.bounding_box))
print("Confidence score: {}\n".format(cell.confidence))
if content.selection_marks:
print("Selection marks found on page {}:".format(content.page_number))
for selection_mark in content.selection_marks:
print("Selection mark is '{}' within bounding box '{}' and has a confidence of {}".format(
selection_mark.state,
selection_mark.bounding_box,
selection_mark.confidence
))
你会注意到我使用的路径是
/dbfs/mnt/lake/RAW/export/sentimenttest.txt
当我执行代码时出现错误:
ValueError: Content type could not be auto-detected. Please pass the content_type keyword argument.
谁能告诉我我需要做什么来解决这个问题
先决条件
• Python 需要 2.7、3.5 或更高版本才能使用此包。
• 您必须拥有 Azure 订阅和认知服务或表单识别器资源才能使用此包。
从给定文档中提取文本和 content/layout 信息。输入文档必须是支持的内容类型之一 - 'application/pdf'、'image/jpeg'、'image/png'、'image/tiff' 或 'image/bmp'.
v2.1 版新功能:页面、语言和阅读顺序关键字参数以及对 image/bmp 内容
的支持
请参阅此 link 了解更多信息
我使用 https://docs.microsoft.com/en-us/python/api/overview/azure/ai-formrecognizer-readme?view=azure-python 中的指南来识别 Databricks 的内容。
我使用的代码是
from azure.ai.formrecognizer import FormRecognizerClient
from azure.core.credentials import AzureKeyCredential
endpoint = "https://<region>.api.cognitive.microsoft.com/"
credential = AzureKeyCredential("<api_key>")
form_recognizer_client = FormRecognizerClient(endpoint, credential)
with open("/dbfs/mnt/lake/RAW/export/sentimenttest.txt", "rb") as fd:
form = fd.read()
poller = form_recognizer_client.begin_recognize_content(form)
form_pages = poller.result()
for content in form_pages:
for table in content.tables:
print("Table found on page {}:".format(table.page_number))
print("Table location {}:".format(table.bounding_box))
for cell in table.cells:
print("Cell text: {}".format(cell.text))
print("Location: {}".format(cell.bounding_box))
print("Confidence score: {}\n".format(cell.confidence))
if content.selection_marks:
print("Selection marks found on page {}:".format(content.page_number))
for selection_mark in content.selection_marks:
print("Selection mark is '{}' within bounding box '{}' and has a confidence of {}".format(
selection_mark.state,
selection_mark.bounding_box,
selection_mark.confidence
))
你会注意到我使用的路径是
/dbfs/mnt/lake/RAW/export/sentimenttest.txt
当我执行代码时出现错误:
ValueError: Content type could not be auto-detected. Please pass the content_type keyword argument.
谁能告诉我我需要做什么来解决这个问题
先决条件
• Python 需要 2.7、3.5 或更高版本才能使用此包。
• 您必须拥有 Azure 订阅和认知服务或表单识别器资源才能使用此包。
从给定文档中提取文本和 content/layout 信息。输入文档必须是支持的内容类型之一 - 'application/pdf'、'image/jpeg'、'image/png'、'image/tiff' 或 'image/bmp'.
v2.1 版新功能:页面、语言和阅读顺序关键字参数以及对 image/bmp 内容
的支持请参阅此 link 了解更多信息