为什么带有分析器的 shingle token 过滤器没有产生预期的结果?
why is shingle token filter with analyser isn't yielding expected results?
您好,这是我的索引详细信息:
PUT shingle_test
{
"settings": {
"analysis": {
"analyzer": {
"evolutionAnalyzer": {
"tokenizer": "standard",
"filter": [
"standard",
"custom_shingle"
]
}
},
"filter": {
"custom_stop": {
"type": "stop",
"stopwords": "_english_"
},
"custom_shingle": {
"type": "shingle",
"min_shingle_size": "2",
"max_shingle_size": "10",
"output_unigrams": false
}
}
}
},
"mappings": {
"legacy" : {
"properties": {
"name": {
"type": "text",
"fields": {
"shingles": {
"type": "text",
"analyzer": "standard",
"search_analyzer": "evolutionAnalyzer"
},
"as_is": {
"type": "keyword"
}
},
"analyzer": "standard"
}
}
}
}
}
添加了 2 个文档
PUT shingle_test/legacy/1
{
"name": "Chandni Chowk 2 Banglore"
}
PUT shingle_test/legacy/2
{
"name": "Chandni Chowk"
}
如果我这样做,return不会编辑任何内容,
GET shingle_test/_search
{
"query": {
"match": {
"name": {
"query": "Chandni Chowk",
"analyzer": "evolutionAnalyzer"
}
}
}
}
在网上查看了所有可能的解决方案,没有找到。
此外,如果我执行“output_unigrams”:true,那么它就像匹配查询一样工作并给出结果。
我想要实现的目标:
拥有这些文件:
- Chandni Chowk 2 班加罗尔
- 月光集市
- CCD 班加罗尔
- Istah 沙瓦玛和印度香饭
- 伊斯塔
所以,
搜索“Chandni Chowk 2 Bangalore”应该 return 1, 2
搜索“Chandni Chowk”应该 return 1、2
搜索“Istah shawarma and biryani”应该 return 4, 5
搜索“Istah”应该 return 4, 5
搜索“CCD Bangalore”应该 return 3
注意:搜索关键字将始终与文档中名称字段的值完全相同 例如:在这个特定的索引中,我们可以查询“Chandni Chowk 2 Bangalore”、“Chandni Chowk”、“CCD Bangalore”、 “Istah shawarma 和 biryani”,“Istah”。 "CCD" 不会在该索引上被查询。
analyzer 参数指定在索引或搜索文本字段时用于文本分析的分析器。
将您的索引映射修改为
{
"settings": {
"analysis": {
"analyzer": {
"evolutionAnalyzer": {
"tokenizer": "standard",
"filter": [
"standard",
"custom_shingle"
]
}
},
"filter": {
"custom_stop": {
"type": "stop",
"stopwords": "_english_"
},
"custom_shingle": {
"type": "shingle",
"min_shingle_size": "2",
"max_shingle_size": "10",
"output_unigrams": true // note this
}
}
}
},
"mappings": {
"legacy" : {
"properties": {
"name": {
"type": "text",
"fields": {
"shingles": {
"type": "text",
"analyzer": "evolutionAnalyzer", // note this
"search_analyzer": "evolutionAnalyzer"
},
"as_is": {
"type": "keyword"
}
},
"analyzer": "standard"
}
}
}
}
}
并且,修改后的搜索查询将是
{
"query": {
"match": {
"name.shingles": {
"query": "Chandni Chowk"
}
}
}
}
搜索结果:
"hits": [
{
"_index": "66127416",
"_type": "_doc",
"_id": "2",
"_score": 0.25759193,
"_source": {
"name": "Chandni Chowk"
}
},
{
"_index": "66127416",
"_type": "_doc",
"_id": "1",
"_score": 0.19363807,
"_source": {
"name": "Chandni Chowk 2 Banglore"
}
}
]
您好,这是我的索引详细信息:
PUT shingle_test
{
"settings": {
"analysis": {
"analyzer": {
"evolutionAnalyzer": {
"tokenizer": "standard",
"filter": [
"standard",
"custom_shingle"
]
}
},
"filter": {
"custom_stop": {
"type": "stop",
"stopwords": "_english_"
},
"custom_shingle": {
"type": "shingle",
"min_shingle_size": "2",
"max_shingle_size": "10",
"output_unigrams": false
}
}
}
},
"mappings": {
"legacy" : {
"properties": {
"name": {
"type": "text",
"fields": {
"shingles": {
"type": "text",
"analyzer": "standard",
"search_analyzer": "evolutionAnalyzer"
},
"as_is": {
"type": "keyword"
}
},
"analyzer": "standard"
}
}
}
}
}
添加了 2 个文档
PUT shingle_test/legacy/1
{
"name": "Chandni Chowk 2 Banglore"
}
PUT shingle_test/legacy/2
{
"name": "Chandni Chowk"
}
如果我这样做,return不会编辑任何内容,
GET shingle_test/_search
{
"query": {
"match": {
"name": {
"query": "Chandni Chowk",
"analyzer": "evolutionAnalyzer"
}
}
}
}
在网上查看了所有可能的解决方案,没有找到。
此外,如果我执行“output_unigrams”:true,那么它就像匹配查询一样工作并给出结果。
我想要实现的目标:
拥有这些文件:
- Chandni Chowk 2 班加罗尔
- 月光集市
- CCD 班加罗尔
- Istah 沙瓦玛和印度香饭
- 伊斯塔
所以, 搜索“Chandni Chowk 2 Bangalore”应该 return 1, 2
搜索“Chandni Chowk”应该 return 1、2
搜索“Istah shawarma and biryani”应该 return 4, 5
搜索“Istah”应该 return 4, 5
搜索“CCD Bangalore”应该 return 3
注意:搜索关键字将始终与文档中名称字段的值完全相同 例如:在这个特定的索引中,我们可以查询“Chandni Chowk 2 Bangalore”、“Chandni Chowk”、“CCD Bangalore”、 “Istah shawarma 和 biryani”,“Istah”。 "CCD" 不会在该索引上被查询。
analyzer 参数指定在索引或搜索文本字段时用于文本分析的分析器。
将您的索引映射修改为
{
"settings": {
"analysis": {
"analyzer": {
"evolutionAnalyzer": {
"tokenizer": "standard",
"filter": [
"standard",
"custom_shingle"
]
}
},
"filter": {
"custom_stop": {
"type": "stop",
"stopwords": "_english_"
},
"custom_shingle": {
"type": "shingle",
"min_shingle_size": "2",
"max_shingle_size": "10",
"output_unigrams": true // note this
}
}
}
},
"mappings": {
"legacy" : {
"properties": {
"name": {
"type": "text",
"fields": {
"shingles": {
"type": "text",
"analyzer": "evolutionAnalyzer", // note this
"search_analyzer": "evolutionAnalyzer"
},
"as_is": {
"type": "keyword"
}
},
"analyzer": "standard"
}
}
}
}
}
并且,修改后的搜索查询将是
{
"query": {
"match": {
"name.shingles": {
"query": "Chandni Chowk"
}
}
}
}
搜索结果:
"hits": [
{
"_index": "66127416",
"_type": "_doc",
"_id": "2",
"_score": 0.25759193,
"_source": {
"name": "Chandni Chowk"
}
},
{
"_index": "66127416",
"_type": "_doc",
"_id": "1",
"_score": 0.19363807,
"_source": {
"name": "Chandni Chowk 2 Banglore"
}
}
]