文本字段上的 ElasticSearch Analyzer
ElasticSearch Analyzer on text field
这是我在 elasticSearch 上的字段:
"keywordName": {
"type": "text",
"analyzer": "custom_stop"
}
这是我的分析仪:
"custom_stop": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"my_stop",
"my_snow",
"asciifolding"
]
}
这是我的过滤器:
"my_stop": {
"type": "stop",
"stopwords": "_french_"
},
"my_snow" : {
"type" : "snowball",
"language" : "French"
}
这是我的文档我的索引(在我唯一的字段中:keywordName):
"canne a peche"、"canne"、"canne a peche telescopique"、"iphone 8"、"iphone 8 case"、"iphone 8 cover"、"iphone 8 charger"、"iphone 8 new"
当我搜索 "canne" 时,它给了我 "canne" 文档,这正是我想要的:
GET ads/_search
{
"query": {
"match": {
"keywordName": {
"query": "canne",
"operator": "and"
}
}
},
"size": 1
}
当我搜索 "canne à pêche" 时,它会给我 "canne a peche",这也可以。 "Cannes à Pêche" -> "canne a peche" -> OK.
相同
这是棘手的部分:当我搜索 "iphone 8" 时,它给我 "iphone 8 cover" 而不是 "iphone 8"。如果我改变大小,我设置 5(因为它 returns 包含 "iphone 8" 的 5 个结果)。我看到 "iphone 8" 是得分方面的第四个结果。首先是 "iphone 8 cover",然后是 "iphone 8 case",然后是 "iphone 8 new",最后是 "iphone 8" ...
查询结果如下:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 1.4009607,
"hits": [
{
"_index": "ads",
"_type": "keyword",
"_id": "iphone 8 cover",
"_score": 1.4009607,
"_source": {
"keywordName": "iphone 8 cover"
}
},
{
"_index": "ads",
"_type": "keyword",
"_id": "iphone 8 case",
"_score": 1.4009607,
"_source": {
"keywordName": "iphone 8 case"
}
},
{
"_index": "ads",
"_type": "keyword",
"_id": "iphone 8 new",
"_score": 0.70293105,
"_source": {
"keywordName": "iphone 8 new"
}
},
{
"_index": "ads",
"_type": "keyword",
"_id": "iphone 8",
"_score": 0.5804671,
"_source": {
"keywordName": "iphone 8"
}
},
{
"_index": "ads",
"_type": "keyword",
"_id": "iphone 8 charge",
"_score": 0.46705723,
"_source": {
"keywordName": "iphone 8 charge"
}
}
]
}
}
我怎样才能保持关键字 "canne a peche"(重音、大写字母、复数术语)的灵活性,同时告诉他如果有完全匹配 ("iphone 8" = "iphone 8"), 给我确切的关键字名称 ?
匹配查询使用tf/idf算法。这意味着您将获得按频率排序的模糊搜索结果。如果您想在完全匹配的情况下获得结果,您应该在之前创建一个 query_string 案例,如果没有结果,请使用您的匹配查询。
我建议这样:
"keywordName": {
"type": "text",
"analyzer": "custom_stop",
"fields": {
"raw": {
"type": "keyword"
}
}
}
以及查询:
{
"query": {
"bool": {
"should": [
{
"match": {
"keywordName": {
"query": "iphone 8",
"operator": "and"
}
}
},
{
"term": {
"keywordName.raw": {
"value": "iphone 8"
}
}
}
]
}
},
"size": 10
}
这是我在 elasticSearch 上的字段:
"keywordName": {
"type": "text",
"analyzer": "custom_stop"
}
这是我的分析仪:
"custom_stop": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"my_stop",
"my_snow",
"asciifolding"
]
}
这是我的过滤器:
"my_stop": {
"type": "stop",
"stopwords": "_french_"
},
"my_snow" : {
"type" : "snowball",
"language" : "French"
}
这是我的文档我的索引(在我唯一的字段中:keywordName):
"canne a peche"、"canne"、"canne a peche telescopique"、"iphone 8"、"iphone 8 case"、"iphone 8 cover"、"iphone 8 charger"、"iphone 8 new"
当我搜索 "canne" 时,它给了我 "canne" 文档,这正是我想要的:
GET ads/_search
{
"query": {
"match": {
"keywordName": {
"query": "canne",
"operator": "and"
}
}
},
"size": 1
}
当我搜索 "canne à pêche" 时,它会给我 "canne a peche",这也可以。 "Cannes à Pêche" -> "canne a peche" -> OK.
相同这是棘手的部分:当我搜索 "iphone 8" 时,它给我 "iphone 8 cover" 而不是 "iphone 8"。如果我改变大小,我设置 5(因为它 returns 包含 "iphone 8" 的 5 个结果)。我看到 "iphone 8" 是得分方面的第四个结果。首先是 "iphone 8 cover",然后是 "iphone 8 case",然后是 "iphone 8 new",最后是 "iphone 8" ...
查询结果如下:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 1.4009607,
"hits": [
{
"_index": "ads",
"_type": "keyword",
"_id": "iphone 8 cover",
"_score": 1.4009607,
"_source": {
"keywordName": "iphone 8 cover"
}
},
{
"_index": "ads",
"_type": "keyword",
"_id": "iphone 8 case",
"_score": 1.4009607,
"_source": {
"keywordName": "iphone 8 case"
}
},
{
"_index": "ads",
"_type": "keyword",
"_id": "iphone 8 new",
"_score": 0.70293105,
"_source": {
"keywordName": "iphone 8 new"
}
},
{
"_index": "ads",
"_type": "keyword",
"_id": "iphone 8",
"_score": 0.5804671,
"_source": {
"keywordName": "iphone 8"
}
},
{
"_index": "ads",
"_type": "keyword",
"_id": "iphone 8 charge",
"_score": 0.46705723,
"_source": {
"keywordName": "iphone 8 charge"
}
}
]
}
}
我怎样才能保持关键字 "canne a peche"(重音、大写字母、复数术语)的灵活性,同时告诉他如果有完全匹配 ("iphone 8" = "iphone 8"), 给我确切的关键字名称 ?
匹配查询使用tf/idf算法。这意味着您将获得按频率排序的模糊搜索结果。如果您想在完全匹配的情况下获得结果,您应该在之前创建一个 query_string 案例,如果没有结果,请使用您的匹配查询。
我建议这样:
"keywordName": {
"type": "text",
"analyzer": "custom_stop",
"fields": {
"raw": {
"type": "keyword"
}
}
}
以及查询:
{
"query": {
"bool": {
"should": [
{
"match": {
"keywordName": {
"query": "iphone 8",
"operator": "and"
}
}
},
{
"term": {
"keywordName.raw": {
"value": "iphone 8"
}
}
}
]
}
},
"size": 10
}