ElasticSearch 匹配分数
ElasticSearch match score
我的索引中有一个 "text" 类型的简单字段。
"keywordName": {
"type": "text"
}
我已经插入了这些文件:"samsung"、"samsung galaxy"、"samsung cover"、"samsung charger"。
如果我进行简单的 "match" 查询,结果令人不安:
查询:
GET keywords/_search
{
"query": {
"match": {
"keywordName": "samsung"
}
}
}
结果:
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 1.113083,
"hits": [
{
"_index": "keywords",
"_type": "keyword",
"_id": "samsung galaxy",
"_score": 1.113083,
"_source": {
"keywordName": "samsung galaxy"
}
},
{
"_index": "keywords",
"_type": "keyword",
"_id": "samsung charger",
"_score": 0.9433406,
"_source": {
"keywordName": "samsung charger"
}
},
{
"_index": "keywords",
"_type": "keyword",
"_id": "samsung",
"_score": 0.8405092,
"_source": {
"keywordName": "samsung"
}
},
{
"_index": "keywords",
"_type": "keyword",
"_id": "samsung cover",
"_score": 0.58279467,
"_source": {
"keywordName": "samsung cover"
}
}
]
}
}
第一个问题:为什么"samsung"没有最高分?
第二个问题:如何进行查询或分析,得到最高分 "samsung"?
从与我的 相同的索引设置(分析器、过滤器、映射)开始,我建议采用以下解决方案。但是,正如我提到的,您需要根据您需要在此索引中搜索的内容制定所有要求,并将所有这些视为一个完整的解决方案。
DELETE test
PUT test
{
"settings": {
"analysis": {
"analyzer": {
"custom_stop": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"my_stop",
"my_snow",
"asciifolding"
]
}
},
"filter": {
"my_stop": {
"type": "stop",
"stopwords": "_french_"
},
"my_snow": {
"type": "snowball",
"language": "French"
}
}
}
},
"mappings": {
"test": {
"properties": {
"keywordName": {
"type": "text",
"analyzer": "custom_stop",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
POST /test/test/_bulk
{"index":{}}
{"keywordName":"samsung galaxy"}
{"index":{}}
{"keywordName":"samsung charger"}
{"index":{}}
{"keywordName":"samsung cover"}
{"index":{}}
{"keywordName":"samsung"}
GET /test/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"keywordName": {
"query": "samsungs",
"operator": "and"
}
}
},
{
"term": {
"keywordName.raw": {
"value": "samsungs"
}
}
},
{
"fuzzy": {
"keywordName.raw": {
"value": "samsungs",
"fuzziness": 1
}
}
}
]
}
},
"size": 10
}
我的索引中有一个 "text" 类型的简单字段。
"keywordName": {
"type": "text"
}
我已经插入了这些文件:"samsung"、"samsung galaxy"、"samsung cover"、"samsung charger"。
如果我进行简单的 "match" 查询,结果令人不安:
查询:
GET keywords/_search
{
"query": {
"match": {
"keywordName": "samsung"
}
}
}
结果:
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 1.113083,
"hits": [
{
"_index": "keywords",
"_type": "keyword",
"_id": "samsung galaxy",
"_score": 1.113083,
"_source": {
"keywordName": "samsung galaxy"
}
},
{
"_index": "keywords",
"_type": "keyword",
"_id": "samsung charger",
"_score": 0.9433406,
"_source": {
"keywordName": "samsung charger"
}
},
{
"_index": "keywords",
"_type": "keyword",
"_id": "samsung",
"_score": 0.8405092,
"_source": {
"keywordName": "samsung"
}
},
{
"_index": "keywords",
"_type": "keyword",
"_id": "samsung cover",
"_score": 0.58279467,
"_source": {
"keywordName": "samsung cover"
}
}
]
}
}
第一个问题:为什么"samsung"没有最高分?
第二个问题:如何进行查询或分析,得到最高分 "samsung"?
从与我的
DELETE test
PUT test
{
"settings": {
"analysis": {
"analyzer": {
"custom_stop": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"my_stop",
"my_snow",
"asciifolding"
]
}
},
"filter": {
"my_stop": {
"type": "stop",
"stopwords": "_french_"
},
"my_snow": {
"type": "snowball",
"language": "French"
}
}
}
},
"mappings": {
"test": {
"properties": {
"keywordName": {
"type": "text",
"analyzer": "custom_stop",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
POST /test/test/_bulk
{"index":{}}
{"keywordName":"samsung galaxy"}
{"index":{}}
{"keywordName":"samsung charger"}
{"index":{}}
{"keywordName":"samsung cover"}
{"index":{}}
{"keywordName":"samsung"}
GET /test/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"keywordName": {
"query": "samsungs",
"operator": "and"
}
}
},
{
"term": {
"keywordName.raw": {
"value": "samsungs"
}
}
},
{
"fuzzy": {
"keywordName.raw": {
"value": "samsungs",
"fuzziness": 1
}
}
}
]
}
},
"size": 10
}