使用正则表达式查询的 Lucene 文档 scoring/ranking
Lucene documents scoring/ranking with regex query
我正在使用 Azure 搜索,但假设我的问题与 Lucene 更相关。
当查询完全或部分由正则表达式组成时,无法找到有关如何计算文档排名(分数)的任何信息。例如:
正在搜索 "microsoft" returns 正常计算的分数:
{ score: 6.088776, name: "Microsoft Research" }
{ score: 5.9090853, name: "Microsoft Corporation" }
{ score: 5.0747375, name: "Microsoft Philippines, Inc." }
{ score: 4.93202, name: "Microsoft Dynamics, Inc." }
当搜索“/.micro./”时 returns 分数等于 1:
{ score: 1, name: "Microsoft Dynamics, Inc." }
{ score: 1, name: "Microsoft Philippines, Inc." }
{ score: 1, name: "Microsoft Startup Alley" }
并搜索 "microsoft /.micro./",returns 我想 "microsoft" 术语分数和 /.micro./ 术语分数的总和(总是等于 1):
{ score: 5.2132897, name: "Microsoft Research" }
{ score: 5.198583, name: "Microsoft Corporation" }
{ score: 4.973414, name: "Microsoft Philippines, Inc." }
我需要的是运行完全正则表达式查询并计算分数。
在 Azure 搜索中,通配符搜索查询(如前缀、正则表达式和模糊搜索查询)经过内部查询重写过程和 return 常量分数。这主要是出于性能原因,也是为了防止我们默认的基于词频的评分 (TF-IDF) 偏向于来自频率较低的唯一词的匹配项。该行为记录在 https://docs.microsoft.com/en-us/rest/api/searchservice/lucene-query-syntax-in-azure-search#bkmk_searchscoreforwildcardandregexqueries. There currently isn't a way to change this default behavior. If you feel that the feature is important, please create an entry in our user voice (https://feedback.azure.com/forums/263029-azure-search) 中以帮助我们确定优先级。谢谢。
内特
我正在使用 Azure 搜索,但假设我的问题与 Lucene 更相关。 当查询完全或部分由正则表达式组成时,无法找到有关如何计算文档排名(分数)的任何信息。例如:
正在搜索 "microsoft" returns 正常计算的分数:
{ score: 6.088776, name: "Microsoft Research" }
{ score: 5.9090853, name: "Microsoft Corporation" }
{ score: 5.0747375, name: "Microsoft Philippines, Inc." }
{ score: 4.93202, name: "Microsoft Dynamics, Inc." }
当搜索“/.micro./”时 returns 分数等于 1:
{ score: 1, name: "Microsoft Dynamics, Inc." }
{ score: 1, name: "Microsoft Philippines, Inc." }
{ score: 1, name: "Microsoft Startup Alley" }
并搜索 "microsoft /.micro./",returns 我想 "microsoft" 术语分数和 /.micro./ 术语分数的总和(总是等于 1):
{ score: 5.2132897, name: "Microsoft Research" }
{ score: 5.198583, name: "Microsoft Corporation" }
{ score: 4.973414, name: "Microsoft Philippines, Inc." }
我需要的是运行完全正则表达式查询并计算分数。
在 Azure 搜索中,通配符搜索查询(如前缀、正则表达式和模糊搜索查询)经过内部查询重写过程和 return 常量分数。这主要是出于性能原因,也是为了防止我们默认的基于词频的评分 (TF-IDF) 偏向于来自频率较低的唯一词的匹配项。该行为记录在 https://docs.microsoft.com/en-us/rest/api/searchservice/lucene-query-syntax-in-azure-search#bkmk_searchscoreforwildcardandregexqueries. There currently isn't a way to change this default behavior. If you feel that the feature is important, please create an entry in our user voice (https://feedback.azure.com/forums/263029-azure-search) 中以帮助我们确定优先级。谢谢。
内特