Azure 搜索 - 无法合并（使用技能）从 KeyPhraseExtractionSkill 获得的数据

Question

我正在创建一个索引器，它获取一个文档，运行s KeyPhraseExtractionSkill 并将其输出回索引。

对于许多文档，这是开箱即用的。但是对于那些超过 50,000 条的记录，这不起作用。好的，没问题；这在文档中有明确说明。

文档建议使用 Text Split Skill。我所做的是使用文本拆分技能，将原始文档拆分为页面，将所有页面传递给 KeyPhraseExtractionSkill。然后我们需要将它们合并回来，因为我们最终会得到一个字符串数组。不幸的是，Merge Skill似乎不接受数组的数组，只是一个数组。

https://i.imgur.com/dBD4qgb.png <- Link 到技能组层次结构。

这是 Azure 报告的错误：

Required skill input was not of the expected type 'StringCollection'. Name: 'itemsToInsert', Source: '/document/content/pages/*/keyPhrases'. Expression language parsing issues:

我最终想要实现的是运行大于 50,000 的文本的 KeyPhraseExtractionSkill 最终将其添加回索引。

JSON 技能组合

  "@odata.context": "https://-----------.search.windows.net/$metadata#skillsets/$entity",
  "@odata.etag": "\"0x8D957466A2C1E47\"",
  "name": "devalbertcollectionfilesskillset2",
  "description": null,
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
      "name": "SplitSkill",
      "description": null,
      "context": "/document/content",
      "defaultLanguageCode": "en",
      "textSplitMode": "pages",
      "maximumPageLength": 1000,
      "inputs": [
        {
          "name": "text",
          "source": "/document/content"
        }
      ],
      "outputs": [
        {
          "name": "textItems",
          "targetName": "pages"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.EntityRecognitionSkill",
      "name": "EntityRecognitionSkill",
      "description": null,
      "context": "/document/content/pages/*",
      "categories": [
        "person",
        "quantity",
        "organization",
        "url",
        "email",
        "location",
        "datetime"
      ],
      "defaultLanguageCode": "en",
      "minimumPrecision": null,
      "includeTypelessEntities": null,
      "inputs": [
        {
          "name": "text",
          "source": "/document/content/pages/*"
        }
      ],
      "outputs": [
        {
          "name": "persons",
          "targetName": "people"
        },
        {
          "name": "organizations",
          "targetName": "organizations"
        },
        {
          "name": "entities",
          "targetName": "entities"
        },
        {
          "name": "locations",
          "targetName": "locations"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.KeyPhraseExtractionSkill",
      "name": "KeyPhraseExtractionSkill",
      "description": null,
      "context": "/document/content/pages/*",
      "defaultLanguageCode": "en",
      "maxKeyPhraseCount": null,
      "modelVersion": null,
      "inputs": [
        {
          "name": "text",
          "source": "/document/content/pages/*"
        }
      ],
      "outputs": [
        {
          "name": "keyPhrases",
          "targetName": "keyPhrases"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.MergeSkill",
      "name": "Merge Skill - keyPhrases",
      "description": null,
      "context": "/document",
      "insertPreTag": " ",
      "insertPostTag": " ",
      "inputs": [
        {
          "name": "itemsToInsert",
          "source": "/document/content/pages/*/keyPhrases"
        }
      ],
      "outputs": [
        {
          "name": "mergedText",
          "targetName": "keyPhrases"
        }
      ]
    }
  ],
  "cognitiveServices": {
    "@odata.type": "#Microsoft.Azure.Search.CognitiveServicesByKey",
    "key": "------",
    "description": "/subscriptions/13abe1c6-d700-4f8f-916a-8d3bc17bb41e/resourceGroups/mde-dev-rg/providers/Microsoft.CognitiveServices/accounts/mde-dev-cognitive"
  },
  "knowledgeStore": null,
  "encryptionKey": null
}```

Please let me know if there is anything else that I can add to improve the question. Thanks!


  [1]: https://i.stack.imgur.com/GNf7F.png

Answer 1

您不必合并关键短语输出即可将它们插入索引。

假设您的索引已经有一个名为 mykeyphrases 的字段，类型为 Collection(Edm.String), to populate it with the key phrase outputs, add this indexer output field mapping:

"outputFieldMappings": [
  ...

  {
    "sourceFieldName": "/document/content/pages/*/keyPhrases/*",
    "targetFieldName": "mykeyphrases"
  },

  ...
]

sourceFieldName 末尾的 /* 对于扁平化字符串数组很重要。如果您想将字符串数组传递给另一个技能以进行其他丰富，这也将用作技能输入。

Azure 搜索 - 无法合并（使用技能）从 KeyPhraseExtractionSkill 获得的数据

Azure Search - Cannot merge (with skill) data obtained from the KeyPhraseExtractionSkill

azure-cognitive-search

azure-search-.net-sdk