Python 的 Databricks Azure 表单识别器库失败：无法自动检测内容类型。请传递 content_type 关键字参数

Question

我使用 https://docs.microsoft.com/en-us/python/api/overview/azure/ai-formrecognizer-readme?view=azure-python 中的指南来识别 Databricks 的内容。

我使用的代码是

from azure.ai.formrecognizer import FormRecognizerClient
from azure.core.credentials import AzureKeyCredential

endpoint = "https://<region>.api.cognitive.microsoft.com/"
credential = AzureKeyCredential("<api_key>")

form_recognizer_client = FormRecognizerClient(endpoint, credential)

with open("/dbfs/mnt/lake/RAW/export/sentimenttest.txt", "rb") as fd:
    form = fd.read()

poller = form_recognizer_client.begin_recognize_content(form)
form_pages = poller.result()

for content in form_pages:
    for table in content.tables:
        print("Table found on page {}:".format(table.page_number))
        print("Table location {}:".format(table.bounding_box))
        for cell in table.cells:
            print("Cell text: {}".format(cell.text))
            print("Location: {}".format(cell.bounding_box))
            print("Confidence score: {}\n".format(cell.confidence))

    if content.selection_marks:
        print("Selection marks found on page {}:".format(content.page_number))
        for selection_mark in content.selection_marks:
            print("Selection mark is '{}' within bounding box '{}' and has a confidence of {}".format(
                selection_mark.state,
                selection_mark.bounding_box,
                selection_mark.confidence
            ))

你会注意到我使用的路径是

/dbfs/mnt/lake/RAW/export/sentimenttest.txt

当我执行代码时出现错误：

ValueError: Content type could not be auto-detected. Please pass the content_type keyword argument.

谁能告诉我我需要做什么来解决这个问题

Answer 1

先决条件

• Python 需要 2.7、3.5 或更高版本才能使用此包。

• 您必须拥有 Azure 订阅和认知服务或表单识别器资源才能使用此包。

从给定文档中提取文本和 content/layout 信息。输入文档必须是支持的内容类型之一 - 'application/pdf'、'image/jpeg'、'image/png'、'image/tiff' 或 'image/bmp'.

v2.1 版新功能：页面、语言和阅读顺序关键字参数以及对 image/bmp 内容

的支持

请参阅此 link 了解更多信息

Python 的 Databricks Azure 表单识别器库失败：无法自动检测内容类型。请传递 content_type 关键字参数

Databricks Azure Form Recognizer library for Python failing: Content type could not be auto-detected. Please pass the content_type keyword argument

azure-cognitive-services

azure-databricks