了解评分结果 - 精确分数低于部分分数
Understanding scoring results - exact scores lower than partial
我知道 Azure 搜索没有实现 Lucene Explain 功能,如果你愿意,可以在这里投票:https://feedback.azure.com/forums/263029-azure-search/suggestions/7379515-support-explain-api
这是我创建的索引
{
"name": "fieldvalue38gram",
"fields": [
{
"name": "FieldValueID",
"type": "Edm.String",
"facetable": false,
"filterable": false,
"key": true,
"retrievable": true,
"searchable": false,
"sortable": false,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "FieldID",
"type": "Edm.Int32",
"facetable": false,
"filterable": true,
"retrievable": true,
"sortable": false,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "Text",
"type": "Edm.String",
"facetable": false,
"filterable": true,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": "whitespace",
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "partialName",
"type": "Edm.String",
"facetable": false,
"filterable": true,
"retrievable": false,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": "ingram",
"searchAnalyzer": "whitespace",
"synonymMaps": [],
"fields": []
}
],
"suggesters": [],
"scoringProfiles": [
{
"name": "exactFirst",
"text": {
"weights": {
"Text": 2,
"partialName": 1
}
}
}
],
"defaultScoringProfile": "",
"corsOptions": null,
"analyzers": [
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "ingram",
"tokenizer": "whitespace",
"tokenFilters": [ "lowercase", "NGramTokenFilter" ],
"charFilters": []
}
],
"charFilters": [],
"tokenFilters": [
{
"@odata.type": "#Microsoft.Azure.Search.NGramTokenFilterV2",
"name": "NGramTokenFilter",
"minGram": 3,
"maxGram": 8
}
],
"tokenizers": []
}
当我使用 search=black 查询时
indexes/fieldvalue38gram/docs?api-version={{version}}&scoringProfile=exactFirst&$top=21&search=black
我最终得到
{
"@search.score": 4.051315,
"FieldValueID": "167402",
"FieldID": 8,
"Text": "BLACKSMITH",
"partialName": "BLACKSMITH"
},
{
"@search.score": 3.9905946,
"FieldValueID": "18594",
"FieldID": 8,
"Text": "BLACK",
"partialName": "BLACK"
},
这不是我所期望的。
我应该得到精确匹配的提升。此外,通过阅读文档,我发现长度在评分中起着重要作用,这意味着较短的文本在索引过程中会获得更高的分数。
考虑到这一点,我不明白为什么第二个结果的得分会低于第一个。
- 谁能解释一下这种情况下的得分?
- 我能做些什么来帮助理解评分吗?
谢谢
更新
2019-10-24
这是我一直在与得分作斗争的一个例子。除了 doc id (FieldValueID) 之外,第一个和第三个条目是相同的。我找不到乐谱差异的韵律或原因。
{
"value": [
{
"@search.score": 0.10707458,
"FieldValueID": "2",
"FieldID": 2,
"Text": "Another Brown2Black Cow"
},
{
"@search.score": 0.021882897,
"FieldValueID": "4",
"FieldID": 2,
"Text": "Brown"
},
{
"@search.score": 0.017285194,
"FieldValueID": "7",
"FieldID": 2,
"Text": "Another Brown2Black Cow"
}
]
}
2019-10-25
刚找到这个:https://docs.microsoft.com/en-us/azure/search/search-lucene-query-architecture#scoring-in-a-distributed-index
我猜这是因为您强制 TEXT 字段使用空白分析器而不是使用默认分析器。我不相信空白分析器会小写你的条款。由于您的搜索查询和文本字段包含不同的大小写,我不确定它们是否匹配。您可以尝试使用不同的外壳进行搜索并查看返回的内容(搜索分析器也是如此,我建议不要在那里也简单地使用空白分析器)。
我知道 Azure 搜索没有实现 Lucene Explain 功能,如果你愿意,可以在这里投票:https://feedback.azure.com/forums/263029-azure-search/suggestions/7379515-support-explain-api
这是我创建的索引
{
"name": "fieldvalue38gram",
"fields": [
{
"name": "FieldValueID",
"type": "Edm.String",
"facetable": false,
"filterable": false,
"key": true,
"retrievable": true,
"searchable": false,
"sortable": false,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "FieldID",
"type": "Edm.Int32",
"facetable": false,
"filterable": true,
"retrievable": true,
"sortable": false,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "Text",
"type": "Edm.String",
"facetable": false,
"filterable": true,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": "whitespace",
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "partialName",
"type": "Edm.String",
"facetable": false,
"filterable": true,
"retrievable": false,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": "ingram",
"searchAnalyzer": "whitespace",
"synonymMaps": [],
"fields": []
}
],
"suggesters": [],
"scoringProfiles": [
{
"name": "exactFirst",
"text": {
"weights": {
"Text": 2,
"partialName": 1
}
}
}
],
"defaultScoringProfile": "",
"corsOptions": null,
"analyzers": [
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "ingram",
"tokenizer": "whitespace",
"tokenFilters": [ "lowercase", "NGramTokenFilter" ],
"charFilters": []
}
],
"charFilters": [],
"tokenFilters": [
{
"@odata.type": "#Microsoft.Azure.Search.NGramTokenFilterV2",
"name": "NGramTokenFilter",
"minGram": 3,
"maxGram": 8
}
],
"tokenizers": []
}
当我使用 search=black 查询时
indexes/fieldvalue38gram/docs?api-version={{version}}&scoringProfile=exactFirst&$top=21&search=black
我最终得到
{
"@search.score": 4.051315,
"FieldValueID": "167402",
"FieldID": 8,
"Text": "BLACKSMITH",
"partialName": "BLACKSMITH"
},
{
"@search.score": 3.9905946,
"FieldValueID": "18594",
"FieldID": 8,
"Text": "BLACK",
"partialName": "BLACK"
},
这不是我所期望的。
我应该得到精确匹配的提升。此外,通过阅读文档,我发现长度在评分中起着重要作用,这意味着较短的文本在索引过程中会获得更高的分数。
考虑到这一点,我不明白为什么第二个结果的得分会低于第一个。
- 谁能解释一下这种情况下的得分?
- 我能做些什么来帮助理解评分吗?
谢谢
更新
2019-10-24
这是我一直在与得分作斗争的一个例子。除了 doc id (FieldValueID) 之外,第一个和第三个条目是相同的。我找不到乐谱差异的韵律或原因。
{
"value": [
{
"@search.score": 0.10707458,
"FieldValueID": "2",
"FieldID": 2,
"Text": "Another Brown2Black Cow"
},
{
"@search.score": 0.021882897,
"FieldValueID": "4",
"FieldID": 2,
"Text": "Brown"
},
{
"@search.score": 0.017285194,
"FieldValueID": "7",
"FieldID": 2,
"Text": "Another Brown2Black Cow"
}
]
}
2019-10-25
刚找到这个:https://docs.microsoft.com/en-us/azure/search/search-lucene-query-architecture#scoring-in-a-distributed-index
我猜这是因为您强制 TEXT 字段使用空白分析器而不是使用默认分析器。我不相信空白分析器会小写你的条款。由于您的搜索查询和文本字段包含不同的大小写,我不确定它们是否匹配。您可以尝试使用不同的外壳进行搜索并查看返回的内容(搜索分析器也是如此,我建议不要在那里也简单地使用空白分析器)。