使一个完整的单词比 Edge NGram 子集有更多的分数
Make a full word have more score than a Edge NGram subset
我正在尝试在全名匹配的文档上获得更高的分数,而不是具有相同值的 Edge NGram 子集。
所以结果是:
Pos Name _score _id
1 Baritone horn 7.56878 1786
2 Baritone ukulele 7.56878 2313
3 Bari 7.56878 2360
4 Baritone voice 7.56878 1787
我本来打算第三个 ("Bari") 会有更高的分数,因为它是全名,但是,由于边缘 ngram 分解将使所有其他人都具有 "bari" 单词索引。那么您是否可以在结果中看到 table,所有人的分数都相等,我什至不知道 elasticsearch 是如何排序的,因为 _id 甚至不是连续的,也不是排序的名称。
我怎样才能做到这一点?
谢谢
示例'code'
设置
{
"analysis": {
"filter": {
"edgeNGram_filter": {
"type": "edgeNGram",
"min_gram": 3,
"max_gram": 20,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"edgeNGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"edgeNGram_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
}
映射:
{
"name": {
"type": "string",
"index": "not_analyzed"
},
"suggest": {
"type": "completion",
"index_analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer",
"payloads": true
}
}
查询:
POST /attribute-tree/attribute/_search
{
"query": {
"match": {
"suggest": "Bari"
}
}
}
结果:
(只留下相关数据)
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 7.56878,
"hits": [
{
"_index": "attribute-tree",
"_type": "attribute",
"_id": "1786",
"_score": 7.56878,
"_source": {
"name": "Baritone horn",
"suggest": {
"input": [
"Baritone",
"horn"
],
"output": "Baritone horn"
}
}
},
{
"_index": "attribute-tree",
"_type": "attribute",
"_id": "2313",
"_score": 7.56878,
"_source": {
"name": "Baritone ukulele",
"suggest": {
"input": [
"Baritone",
"ukulele"
],
"output": "Baritone ukulele"
}
}
},
{
"_index": "attribute-tree",
"_type": "attribute",
"_id": "2360",
"_score": 7.56878,
"_source": {
"name": "Bari",
"suggest": {
"input": [
"Bari"
],
"output": "Bari"
}
}
},
{
"_index": "attribute-tree",
"_type": "attribute",
"_id": "1787",
"_score": 7.568078,
"_source": {
"name": "Baritone voice",
"suggest": {
"input": [
"Baritone",
"voice"
],
"output": "Baritone voice"
}
}
}
]
}
}
您可以使用 bool
查询运算符及其 should
子句为精确匹配添加分数,如下所示:
POST /attribute-tree/attribute/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"suggest": "Bari"
}
}
],
"should": [
{
"match": {
"name": "Bari"
}
}
]
}
}
}
should 子句中的查询在 ElasticSearch definitive guide 中称为 signal 子句,这就是区分完美匹配和 ngram 匹配的方式。您将拥有与 must 子句匹配的所有文档,但是由于 bool
查询评分公式,匹配 should
查询的文档将获得更高的分数:
score = ("must" queries total score + matching "should" queries total score) / (total number of "must" queries and "should" queries)
结果如你所愿,巴里第一(得分遥遥领先:)):
"hits": {
"total": 3,
"max_score": 0.4339554,
"hits": [
{
"_index": "attribute-tree",
"_type": "attribute",
"_id": "2360",
"_score": 0.4339554,
"_source": {
"name": "Bari",
"suggest": {
"input": [
"Bari"
],
"output": "Bari"
}
}
},
{
"_index": "attribute-tree",
"_type": "attribute",
"_id": "1786",
"_score": 0.04500804,
"_source": {
"name": "Baritone horn",
"suggest": {
"input": [
"Baritone",
"horn"
],
"output": "Baritone horn"
}
}
},
{
"_index": "attribute-tree",
"_type": "attribute",
"_id": "2313",
"_score": 0.04500804,
"_source": {
"name": "Baritone ukulele",
"suggest": {
"input": [
"Baritone",
"ukulele"
],
"output": "Baritone ukulele"
}
}
}
]
我正在尝试在全名匹配的文档上获得更高的分数,而不是具有相同值的 Edge NGram 子集。
所以结果是:
Pos Name _score _id
1 Baritone horn 7.56878 1786
2 Baritone ukulele 7.56878 2313
3 Bari 7.56878 2360
4 Baritone voice 7.56878 1787
我本来打算第三个 ("Bari") 会有更高的分数,因为它是全名,但是,由于边缘 ngram 分解将使所有其他人都具有 "bari" 单词索引。那么您是否可以在结果中看到 table,所有人的分数都相等,我什至不知道 elasticsearch 是如何排序的,因为 _id 甚至不是连续的,也不是排序的名称。
我怎样才能做到这一点?
谢谢
示例'code'
设置
{
"analysis": {
"filter": {
"edgeNGram_filter": {
"type": "edgeNGram",
"min_gram": 3,
"max_gram": 20,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"edgeNGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"edgeNGram_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
}
映射:
{
"name": {
"type": "string",
"index": "not_analyzed"
},
"suggest": {
"type": "completion",
"index_analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer",
"payloads": true
}
}
查询:
POST /attribute-tree/attribute/_search
{
"query": {
"match": {
"suggest": "Bari"
}
}
}
结果:
(只留下相关数据)
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 7.56878,
"hits": [
{
"_index": "attribute-tree",
"_type": "attribute",
"_id": "1786",
"_score": 7.56878,
"_source": {
"name": "Baritone horn",
"suggest": {
"input": [
"Baritone",
"horn"
],
"output": "Baritone horn"
}
}
},
{
"_index": "attribute-tree",
"_type": "attribute",
"_id": "2313",
"_score": 7.56878,
"_source": {
"name": "Baritone ukulele",
"suggest": {
"input": [
"Baritone",
"ukulele"
],
"output": "Baritone ukulele"
}
}
},
{
"_index": "attribute-tree",
"_type": "attribute",
"_id": "2360",
"_score": 7.56878,
"_source": {
"name": "Bari",
"suggest": {
"input": [
"Bari"
],
"output": "Bari"
}
}
},
{
"_index": "attribute-tree",
"_type": "attribute",
"_id": "1787",
"_score": 7.568078,
"_source": {
"name": "Baritone voice",
"suggest": {
"input": [
"Baritone",
"voice"
],
"output": "Baritone voice"
}
}
}
]
}
}
您可以使用 bool
查询运算符及其 should
子句为精确匹配添加分数,如下所示:
POST /attribute-tree/attribute/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"suggest": "Bari"
}
}
],
"should": [
{
"match": {
"name": "Bari"
}
}
]
}
}
}
should 子句中的查询在 ElasticSearch definitive guide 中称为 signal 子句,这就是区分完美匹配和 ngram 匹配的方式。您将拥有与 must 子句匹配的所有文档,但是由于 bool
查询评分公式,匹配 should
查询的文档将获得更高的分数:
score = ("must" queries total score + matching "should" queries total score) / (total number of "must" queries and "should" queries)
结果如你所愿,巴里第一(得分遥遥领先:)):
"hits": {
"total": 3,
"max_score": 0.4339554,
"hits": [
{
"_index": "attribute-tree",
"_type": "attribute",
"_id": "2360",
"_score": 0.4339554,
"_source": {
"name": "Bari",
"suggest": {
"input": [
"Bari"
],
"output": "Bari"
}
}
},
{
"_index": "attribute-tree",
"_type": "attribute",
"_id": "1786",
"_score": 0.04500804,
"_source": {
"name": "Baritone horn",
"suggest": {
"input": [
"Baritone",
"horn"
],
"output": "Baritone horn"
}
}
},
{
"_index": "attribute-tree",
"_type": "attribute",
"_id": "2313",
"_score": 0.04500804,
"_source": {
"name": "Baritone ukulele",
"suggest": {
"input": [
"Baritone",
"ukulele"
],
"output": "Baritone ukulele"
}
}
}
]