Azure 搜索 - 无法合并(使用技能)从 KeyPhraseExtractionSkill 获得的数据
Azure Search - Cannot merge (with skill) data obtained from the KeyPhraseExtractionSkill
我正在创建一个索引器,它获取一个文档,运行s KeyPhraseExtractionSkill 并将其输出回索引。
对于许多文档,这是开箱即用的。但是对于那些超过 50,000 条的记录,这不起作用。好的,没问题;这在文档中有明确说明。
文档建议使用 Text Split Skill。我所做的是使用文本拆分技能,将原始文档拆分为页面,将所有页面传递给 KeyPhraseExtractionSkill。然后我们需要将它们合并回来,因为我们最终会得到一个字符串数组。不幸的是,Merge Skill似乎不接受数组的数组,只是一个数组。
https://i.imgur.com/dBD4qgb.png <- Link 到技能组层次结构。
这是 Azure 报告的错误:
Required skill input was not of the expected type 'StringCollection'. Name: 'itemsToInsert', Source: '/document/content/pages/*/keyPhrases'. Expression language parsing issues:
我最终想要实现的是 运行 大于 50,000 的文本的 KeyPhraseExtractionSkill 最终将其添加回索引。
JSON 技能组合
"@odata.context": "https://-----------.search.windows.net/$metadata#skillsets/$entity",
"@odata.etag": "\"0x8D957466A2C1E47\"",
"name": "devalbertcollectionfilesskillset2",
"description": null,
"skills": [
{
"@odata.type": "#Microsoft.Skills.Text.SplitSkill",
"name": "SplitSkill",
"description": null,
"context": "/document/content",
"defaultLanguageCode": "en",
"textSplitMode": "pages",
"maximumPageLength": 1000,
"inputs": [
{
"name": "text",
"source": "/document/content"
}
],
"outputs": [
{
"name": "textItems",
"targetName": "pages"
}
]
},
{
"@odata.type": "#Microsoft.Skills.Text.EntityRecognitionSkill",
"name": "EntityRecognitionSkill",
"description": null,
"context": "/document/content/pages/*",
"categories": [
"person",
"quantity",
"organization",
"url",
"email",
"location",
"datetime"
],
"defaultLanguageCode": "en",
"minimumPrecision": null,
"includeTypelessEntities": null,
"inputs": [
{
"name": "text",
"source": "/document/content/pages/*"
}
],
"outputs": [
{
"name": "persons",
"targetName": "people"
},
{
"name": "organizations",
"targetName": "organizations"
},
{
"name": "entities",
"targetName": "entities"
},
{
"name": "locations",
"targetName": "locations"
}
]
},
{
"@odata.type": "#Microsoft.Skills.Text.KeyPhraseExtractionSkill",
"name": "KeyPhraseExtractionSkill",
"description": null,
"context": "/document/content/pages/*",
"defaultLanguageCode": "en",
"maxKeyPhraseCount": null,
"modelVersion": null,
"inputs": [
{
"name": "text",
"source": "/document/content/pages/*"
}
],
"outputs": [
{
"name": "keyPhrases",
"targetName": "keyPhrases"
}
]
},
{
"@odata.type": "#Microsoft.Skills.Text.MergeSkill",
"name": "Merge Skill - keyPhrases",
"description": null,
"context": "/document",
"insertPreTag": " ",
"insertPostTag": " ",
"inputs": [
{
"name": "itemsToInsert",
"source": "/document/content/pages/*/keyPhrases"
}
],
"outputs": [
{
"name": "mergedText",
"targetName": "keyPhrases"
}
]
}
],
"cognitiveServices": {
"@odata.type": "#Microsoft.Azure.Search.CognitiveServicesByKey",
"key": "------",
"description": "/subscriptions/13abe1c6-d700-4f8f-916a-8d3bc17bb41e/resourceGroups/mde-dev-rg/providers/Microsoft.CognitiveServices/accounts/mde-dev-cognitive"
},
"knowledgeStore": null,
"encryptionKey": null
}```
Please let me know if there is anything else that I can add to improve the question. Thanks!
[1]: https://i.stack.imgur.com/GNf7F.png
您不必合并关键短语输出即可将它们插入索引。
假设您的索引已经有一个名为 mykeyphrases
的字段,类型为 Collection(Edm.String), to populate it with the key phrase outputs, add this indexer output field mapping:
"outputFieldMappings": [
...
{
"sourceFieldName": "/document/content/pages/*/keyPhrases/*",
"targetFieldName": "mykeyphrases"
},
...
]
sourceFieldName
末尾的 /*
对于扁平化字符串数组很重要。如果您想将字符串数组传递给另一个技能以进行其他丰富,这也将用作技能输入。
我正在创建一个索引器,它获取一个文档,运行s KeyPhraseExtractionSkill 并将其输出回索引。
对于许多文档,这是开箱即用的。但是对于那些超过 50,000 条的记录,这不起作用。好的,没问题;这在文档中有明确说明。
文档建议使用 Text Split Skill。我所做的是使用文本拆分技能,将原始文档拆分为页面,将所有页面传递给 KeyPhraseExtractionSkill。然后我们需要将它们合并回来,因为我们最终会得到一个字符串数组。不幸的是,Merge Skill似乎不接受数组的数组,只是一个数组。
https://i.imgur.com/dBD4qgb.png <- Link 到技能组层次结构。
这是 Azure 报告的错误:
Required skill input was not of the expected type 'StringCollection'. Name: 'itemsToInsert', Source: '/document/content/pages/*/keyPhrases'. Expression language parsing issues:
我最终想要实现的是 运行 大于 50,000 的文本的 KeyPhraseExtractionSkill 最终将其添加回索引。
JSON 技能组合
"@odata.context": "https://-----------.search.windows.net/$metadata#skillsets/$entity",
"@odata.etag": "\"0x8D957466A2C1E47\"",
"name": "devalbertcollectionfilesskillset2",
"description": null,
"skills": [
{
"@odata.type": "#Microsoft.Skills.Text.SplitSkill",
"name": "SplitSkill",
"description": null,
"context": "/document/content",
"defaultLanguageCode": "en",
"textSplitMode": "pages",
"maximumPageLength": 1000,
"inputs": [
{
"name": "text",
"source": "/document/content"
}
],
"outputs": [
{
"name": "textItems",
"targetName": "pages"
}
]
},
{
"@odata.type": "#Microsoft.Skills.Text.EntityRecognitionSkill",
"name": "EntityRecognitionSkill",
"description": null,
"context": "/document/content/pages/*",
"categories": [
"person",
"quantity",
"organization",
"url",
"email",
"location",
"datetime"
],
"defaultLanguageCode": "en",
"minimumPrecision": null,
"includeTypelessEntities": null,
"inputs": [
{
"name": "text",
"source": "/document/content/pages/*"
}
],
"outputs": [
{
"name": "persons",
"targetName": "people"
},
{
"name": "organizations",
"targetName": "organizations"
},
{
"name": "entities",
"targetName": "entities"
},
{
"name": "locations",
"targetName": "locations"
}
]
},
{
"@odata.type": "#Microsoft.Skills.Text.KeyPhraseExtractionSkill",
"name": "KeyPhraseExtractionSkill",
"description": null,
"context": "/document/content/pages/*",
"defaultLanguageCode": "en",
"maxKeyPhraseCount": null,
"modelVersion": null,
"inputs": [
{
"name": "text",
"source": "/document/content/pages/*"
}
],
"outputs": [
{
"name": "keyPhrases",
"targetName": "keyPhrases"
}
]
},
{
"@odata.type": "#Microsoft.Skills.Text.MergeSkill",
"name": "Merge Skill - keyPhrases",
"description": null,
"context": "/document",
"insertPreTag": " ",
"insertPostTag": " ",
"inputs": [
{
"name": "itemsToInsert",
"source": "/document/content/pages/*/keyPhrases"
}
],
"outputs": [
{
"name": "mergedText",
"targetName": "keyPhrases"
}
]
}
],
"cognitiveServices": {
"@odata.type": "#Microsoft.Azure.Search.CognitiveServicesByKey",
"key": "------",
"description": "/subscriptions/13abe1c6-d700-4f8f-916a-8d3bc17bb41e/resourceGroups/mde-dev-rg/providers/Microsoft.CognitiveServices/accounts/mde-dev-cognitive"
},
"knowledgeStore": null,
"encryptionKey": null
}```
Please let me know if there is anything else that I can add to improve the question. Thanks!
[1]: https://i.stack.imgur.com/GNf7F.png
您不必合并关键短语输出即可将它们插入索引。
假设您的索引已经有一个名为 mykeyphrases
的字段,类型为 Collection(Edm.String), to populate it with the key phrase outputs, add this indexer output field mapping:
"outputFieldMappings": [
...
{
"sourceFieldName": "/document/content/pages/*/keyPhrases/*",
"targetFieldName": "mykeyphrases"
},
...
]
sourceFieldName
末尾的 /*
对于扁平化字符串数组很重要。如果您想将字符串数组传递给另一个技能以进行其他丰富,这也将用作技能输入。