如何使用 NGram 过滤器自动完成多值字段的 Elasticsearch 术语聚合?
How to get Elasticsearch terms aggregation for multi valued fields using NGram filter for autocompletion?
我正在做我的自动完成项目并且是 Elasticsearch 的新手。我使用 Edge NGram 过滤器进行自动补全。
我试图获得自动完成的独特结果,所以我对所有字段都使用了术语聚合。
对于具有 1 个值的字段,我得到了很好的结果,但是对于具有多个值的字段。如果查询匹配该字段中的至少一个值。它会给我该字段中的所有值(无论查询是否匹配其他值)。
我在服装索引下的设置和映射是:
PUT /garments
{
"settings" :
{
"number_of_replicas": 3,
"number_of_shards": 2,
"analysis":
{
"analyzer":
{
"autocomplete":
{
"tokenizer": "autocomplete",
"filter":
[
"lowercase"
]
},
"autocomplete_search":
{
"tokenizer": "lowercase"
}
},
"tokenizer":
{
"autocomplete":
{
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars":
[
"letter"
]
}
}
}
},
"mappings":
{
"properties":
{
"color":
{
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search",
"fields":
{
"keyword":
{
"type": "keyword"
}
}
}
........
........
........
}
}
(注意我使用的是文本类型)
假设我在具有多个值的文档中有一个颜色字段,例如:["blue"、"black"、"orange"、"marble"、"jet black"]
我的搜索查询是:
GET /garments/_search
{
"size": 0,
"query":
{
"query_string": {
"query": "bl"
}
},
"aggs":
{
"Term_aggregation":
{
"terms":
{
"field": "color.keyword",
"size": 100
}
}
}
}
这给了我所有的输出,即:"blue"、"black"、"orange"、"marble"、"jet black"。
但我只想要蓝色、黑色、墨黑色作为我的结果(查询是 "bl")。
之后
我用了
"include": " .*bl.*"
按照我的条件过滤 aggs.. 结果是蓝色、黑色、大理石、深黑色.. 这包括过滤器区分大小写... 请帮忙!
如果要对关键字字段进行不区分大小写的匹配,可以使用 normalizer 和小写过滤器
The normalizer property of keyword fields is similar to analyzer
except that it guarantees that the analysis chain produces a single
token.
{
"settings": {
"analysis": {
"normalizer": {
"lowercase_normalizer": {
"type": "custom",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"properties": {
"color": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search",
"fields": {
"keyword": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
}
}
}
}
}
}
"include": ".bl." 即使实际值有大写字母也能工作
编辑 1
根据您的意见,如果您不想使用 include in 条款。您需要使用 nested type 索引您的颜色,以便将每种颜色视为单独的对象
映射:
PUT index64
{
"settings": {
"number_of_replicas": 3,
"number_of_shards": 2,
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete",
"filter": [
"lowercase"
]
},
"autocomplete_search": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"autocomplete": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": [
"letter"
]
}
}
}
},
"mappings": {
"properties": {
"color": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
}
查询:
POST index64/_doc
{
"color": [
{
"name": "blue"
},
{
"name": "black"
},
{
"name": "orange"
},
{
"name": "marble"
},
{
"name": "jet black"
}
]
}
结果:
GET index64/_search
{
"size": 0,
"aggs": {
"color": {
"nested": {
"path": "color"
},
"aggs": {
"select_color": {
"filter": {
"match":{
"color.name":"bl"
}
},
"aggs": {
"distinct_colors": {
"terms": {
"field": "color.name.keyword",
"size": 10
}
}
}
}
}
}
}
}
结果
"aggregations" : {
"color" : {
"doc_count" : 5,
"select_color" : {
"doc_count" : 3,
"distinct_colors" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "black",
"doc_count" : 1
},
{
"key" : "blue",
"doc_count" : 1
},
{
"key" : "jet black",
"doc_count" : 1
}
]
}
}
}
}
我正在做我的自动完成项目并且是 Elasticsearch 的新手。我使用 Edge NGram 过滤器进行自动补全。 我试图获得自动完成的独特结果,所以我对所有字段都使用了术语聚合。 对于具有 1 个值的字段,我得到了很好的结果,但是对于具有多个值的字段。如果查询匹配该字段中的至少一个值。它会给我该字段中的所有值(无论查询是否匹配其他值)。
我在服装索引下的设置和映射是:
PUT /garments
{
"settings" :
{
"number_of_replicas": 3,
"number_of_shards": 2,
"analysis":
{
"analyzer":
{
"autocomplete":
{
"tokenizer": "autocomplete",
"filter":
[
"lowercase"
]
},
"autocomplete_search":
{
"tokenizer": "lowercase"
}
},
"tokenizer":
{
"autocomplete":
{
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars":
[
"letter"
]
}
}
}
},
"mappings":
{
"properties":
{
"color":
{
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search",
"fields":
{
"keyword":
{
"type": "keyword"
}
}
}
........
........
........
}
}
(注意我使用的是文本类型) 假设我在具有多个值的文档中有一个颜色字段,例如:["blue"、"black"、"orange"、"marble"、"jet black"] 我的搜索查询是:
GET /garments/_search
{
"size": 0,
"query":
{
"query_string": {
"query": "bl"
}
},
"aggs":
{
"Term_aggregation":
{
"terms":
{
"field": "color.keyword",
"size": 100
}
}
}
}
这给了我所有的输出,即:"blue"、"black"、"orange"、"marble"、"jet black"。 但我只想要蓝色、黑色、墨黑色作为我的结果(查询是 "bl")。 之后 我用了
"include": " .*bl.*"
按照我的条件过滤 aggs.. 结果是蓝色、黑色、大理石、深黑色.. 这包括过滤器区分大小写... 请帮忙!
如果要对关键字字段进行不区分大小写的匹配,可以使用 normalizer 和小写过滤器
The normalizer property of keyword fields is similar to analyzer except that it guarantees that the analysis chain produces a single token.
{
"settings": {
"analysis": {
"normalizer": {
"lowercase_normalizer": {
"type": "custom",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"properties": {
"color": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search",
"fields": {
"keyword": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
}
}
}
}
}
}
"include": ".bl." 即使实际值有大写字母也能工作
编辑 1
根据您的意见,如果您不想使用 include in 条款。您需要使用 nested type 索引您的颜色,以便将每种颜色视为单独的对象
映射:
PUT index64
{
"settings": {
"number_of_replicas": 3,
"number_of_shards": 2,
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete",
"filter": [
"lowercase"
]
},
"autocomplete_search": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"autocomplete": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": [
"letter"
]
}
}
}
},
"mappings": {
"properties": {
"color": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
}
查询:
POST index64/_doc
{
"color": [
{
"name": "blue"
},
{
"name": "black"
},
{
"name": "orange"
},
{
"name": "marble"
},
{
"name": "jet black"
}
]
}
结果:
GET index64/_search
{
"size": 0,
"aggs": {
"color": {
"nested": {
"path": "color"
},
"aggs": {
"select_color": {
"filter": {
"match":{
"color.name":"bl"
}
},
"aggs": {
"distinct_colors": {
"terms": {
"field": "color.name.keyword",
"size": 10
}
}
}
}
}
}
}
}
结果
"aggregations" : {
"color" : {
"doc_count" : 5,
"select_color" : {
"doc_count" : 3,
"distinct_colors" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "black",
"doc_count" : 1
},
{
"key" : "blue",
"doc_count" : 1
},
{
"key" : "jet black",
"doc_count" : 1
}
]
}
}
}
}