为什么 Form Recognizer SDK v3 找不到要训练的 OCR 文档？

Question

我正在尝试使用 v3 preview 构建表单识别器自定义模型，使用示例代码：

Uri trainingFileUri = new Uri(sasToken);
var client = new DocumentModelAdministrationClient(
               new Uri(endpoint), new 
               AzureKeyCredential(apiKey));

BuildModelOperation operation = await client.StartBuildModelAsync(trainingFileUri);

Response<DocumentModel> operationResponse = await operation.WaitForCompletionAsync();

sas 令牌是一个包含 20 个 pdf 文件的 Blob 容器。当我运行我得到错误

Status: 200 (OK) ErrorCode: InvalidRequest

Additional Information: AdditionInformation: InvalidRequest: Invalid request.

Details: ModelBuildError: Could not build the model: Can't find any OCR files for training.

Raw:

{ "code": "InvalidRequest", "message": "Invalid request.", "details": [ { "code": "ModelBuildError", "message": "Could not build the model: Can\u0027t find any OCR files for training." } ] }

SAS token有读、写、列表等权限，不知道为什么客户端找不到可以训练的文档。有什么想法吗？

Answer 1

您链接到的预览 API 不支持没有标签的训练。您将需要一个带标签的数据集来训练模型。

您是否使用 Form Recognizer Studio 标记您的文件？

训练模型需要您的存储帐户包含 3 种类型的文件：

单个文件 - fields.json
对于训练数据集中的每个文件，在标记过程中会创建 2 个附加文件
- {文件名}.labels.json
- {文件名}.ocr.json

错误消息表明您可能没有为文档添加标签。

为什么 Form Recognizer SDK v3 找不到要训练的 OCR 文档？

Why can't Form Recognizer SDK v3 find any OCR documents to train?

.net

ocr

sdk

form-recognizer

azure-form-recognizer