了解评分结果 - 精确分数低于部分分数

Understanding scoring results - exact scores lower than partial

我知道 Azure 搜索没有实现 Lucene Explain 功能,如果你愿意,可以在这里投票:https://feedback.azure.com/forums/263029-azure-search/suggestions/7379515-support-explain-api

这是我创建的索引

{
  "name": "fieldvalue38gram",
  "fields": [
    {
      "name": "FieldValueID",
      "type": "Edm.String",
      "facetable": false,
      "filterable": false,
      "key": true,
      "retrievable": true,
      "searchable": false,
      "sortable": false,
      "analyzer": null,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "synonymMaps": [],
      "fields": []
    },
    {
      "name": "FieldID",
      "type": "Edm.Int32",
      "facetable": false,
      "filterable": true,
      "retrievable": true,
      "sortable": false,
      "analyzer": null,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "synonymMaps": [],
      "fields": []
    },
    {
      "name": "Text",
      "type": "Edm.String",
      "facetable": false,
      "filterable": true,
      "retrievable": true,
      "searchable": true,
      "sortable": true,
      "analyzer": "whitespace",
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "synonymMaps": [],
      "fields": []
    },
    {
      "name": "partialName",
      "type": "Edm.String",
      "facetable": false,
      "filterable": true,
      "retrievable": false,
      "searchable": true,
      "sortable": true,
      "analyzer": null,
      "indexAnalyzer": "ingram",
      "searchAnalyzer": "whitespace",
      "synonymMaps": [],
      "fields": []
    }
  ],
  "suggesters": [],
  "scoringProfiles": [
    {
      "name": "exactFirst",
      "text": {
        "weights": {
          "Text": 2,
          "partialName": 1
        }

      }
    }
  ],
  "defaultScoringProfile": "",
  "corsOptions": null,
  "analyzers": [
    {
      "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
      "name": "ingram",
      "tokenizer": "whitespace",
      "tokenFilters": [ "lowercase", "NGramTokenFilter" ],
      "charFilters": []
    }
  ],
  "charFilters": [],
  "tokenFilters": [
    {
      "@odata.type": "#Microsoft.Azure.Search.NGramTokenFilterV2",
      "name": "NGramTokenFilter",
      "minGram": 3,
      "maxGram": 8
    }
  ],
  "tokenizers": []
}

当我使用 search=black 查询时

indexes/fieldvalue38gram/docs?api-version={{version}}&scoringProfile=exactFirst&$top=21&search=black

我最终得到

{
    "@search.score": 4.051315,
    "FieldValueID": "167402",
    "FieldID": 8,
    "Text": "BLACKSMITH",
    "partialName": "BLACKSMITH"
},
{
    "@search.score": 3.9905946,
    "FieldValueID": "18594",
    "FieldID": 8,
    "Text": "BLACK",
    "partialName": "BLACK"
},

这不是我所期望的。

我应该得到精确匹配的提升。此外,通过阅读文档,我发现长度在评分中起着重要作用,这意味着较短的文本在索引过程中会获得更高的分数。

考虑到这一点,我不明白为什么第二个结果的得分会低于第一个。

谢谢

更新

2019-10-24
这是我一直在与得分作斗争的一个例子。除了 doc id (FieldValueID) 之外,第一个和第三个条目是相同的。我找不到乐谱差异的韵律或原因。

{
    "value": [
        {
            "@search.score": 0.10707458,
            "FieldValueID": "2",
            "FieldID": 2,
            "Text": "Another Brown2Black Cow"
        },
        {
            "@search.score": 0.021882897,
            "FieldValueID": "4",
            "FieldID": 2,
            "Text": "Brown"
        },
        {
            "@search.score": 0.017285194,
            "FieldValueID": "7",
            "FieldID": 2,
            "Text": "Another Brown2Black Cow"
        }
    ]
}

2019-10-25
刚找到这个:https://docs.microsoft.com/en-us/azure/search/search-lucene-query-architecture#scoring-in-a-distributed-index

和这个https://docs.microsoft.com/en-us/azure/search/search-capacity-planning#partition-and-replica-combinations

我猜这是因为您强制 TEXT 字段使用空白分析器而不是使用默认分析器。我不相信空白分析器会小写你的条款。由于您的搜索查询和文本字段包含不同的大小写,我不确定它们是否匹配。您可以尝试使用不同的外壳进行搜索并查看返回的内容(搜索分析器也是如此,我建议不要在那里也简单地使用空白分析器)。