Elastic 中部分搜索的正确索引
Correct index for partial search in Elastic
我有 3 个字段,ID、reference_id 和 postal_code。我希望能够使用这些字段进行搜索,例如:
如果ID是ABCDEFGH,搜索DEF就会显示出来。
如果有邮编29019和27829,搜索2都会显示。搜索 29 将显示 29019
这是自动完成过滤器吗?或通配符过滤器?我在 Elastic 文档中读到将 * 放在通配符过滤器值的前后是不好的,所以我想知道实现此目的的最佳过滤器是什么。
谢谢
您可以通过使用 ngram 和 edge n-gram 分词器来实现所需的用例。
添加具有索引映射、索引数据、搜索查询和搜索结果的工作示例
索引映射:
{
"settings": {
"analysis": {
"analyzer": {
"ngram_analyzer": {
"tokenizer": "ngram_tokenizer"
},
"edge_ngram_analyzer": {
"tokenizer": "edge_tokenizer"
}
},
"tokenizer": {
"ngram_tokenizer": {
"type": "ngram",
"min_gram": 1,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
},
"edge_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
},
"index.max_ngram_diff": 10
},
"mappings": {
"properties": {
"ID": {
"type": "text",
"analyzer": "ngram_analyzer"
},
"postal_codes": {
"type": "text",
"analyzer":"edge_ngram_analyzer"
}
}
}
}
索引数据:
{
"postal_codes": "27829"
}
{
"ID": "ABCDEFGH"
}
{
"postal_codes": "29019"
}
If ID is ABCDEFGH, searching DEF will show it.
搜索查询:
{
"query": {
"term": {
"ID": "DEF"
}
}
}
搜索结果:
"hits": [
{
"_index": "66778583",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"ID": "ABCDEFGH"
}
}
]
searching 2 will show both
搜索查询:
{
"query": {
"term": {
"postal_codes": "2"
}
}
}
搜索结果:
"hits": [
{
"_index": "66778583",
"_type": "_doc",
"_id": "3",
"_score": 0.18232156,
"_source": {
"postal_codes": "27829"
}
},
{
"_index": "66778583",
"_type": "_doc",
"_id": "2",
"_score": 0.18232156,
"_source": {
"postal_codes": "29019"
}
}
]
Search 29 will show 29019
搜索查询:
{
"query": {
"term": {
"postal_codes": "29"
}
}
}
搜索结果:
"hits": [
{
"_index": "66778583",
"_type": "_doc",
"_id": "2",
"_score": 0.6931471,
"_source": {
"postal_codes": "29019"
}
}
]
我有 3 个字段,ID、reference_id 和 postal_code。我希望能够使用这些字段进行搜索,例如:
如果ID是ABCDEFGH,搜索DEF就会显示出来。 如果有邮编29019和27829,搜索2都会显示。搜索 29 将显示 29019
这是自动完成过滤器吗?或通配符过滤器?我在 Elastic 文档中读到将 * 放在通配符过滤器值的前后是不好的,所以我想知道实现此目的的最佳过滤器是什么。
谢谢
您可以通过使用 ngram 和 edge n-gram 分词器来实现所需的用例。
添加具有索引映射、索引数据、搜索查询和搜索结果的工作示例
索引映射:
{
"settings": {
"analysis": {
"analyzer": {
"ngram_analyzer": {
"tokenizer": "ngram_tokenizer"
},
"edge_ngram_analyzer": {
"tokenizer": "edge_tokenizer"
}
},
"tokenizer": {
"ngram_tokenizer": {
"type": "ngram",
"min_gram": 1,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
},
"edge_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
},
"index.max_ngram_diff": 10
},
"mappings": {
"properties": {
"ID": {
"type": "text",
"analyzer": "ngram_analyzer"
},
"postal_codes": {
"type": "text",
"analyzer":"edge_ngram_analyzer"
}
}
}
}
索引数据:
{
"postal_codes": "27829"
}
{
"ID": "ABCDEFGH"
}
{
"postal_codes": "29019"
}
If ID is ABCDEFGH, searching DEF will show it.
搜索查询:
{
"query": {
"term": {
"ID": "DEF"
}
}
}
搜索结果:
"hits": [
{
"_index": "66778583",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"ID": "ABCDEFGH"
}
}
]
searching 2 will show both
搜索查询:
{
"query": {
"term": {
"postal_codes": "2"
}
}
}
搜索结果:
"hits": [
{
"_index": "66778583",
"_type": "_doc",
"_id": "3",
"_score": 0.18232156,
"_source": {
"postal_codes": "27829"
}
},
{
"_index": "66778583",
"_type": "_doc",
"_id": "2",
"_score": 0.18232156,
"_source": {
"postal_codes": "29019"
}
}
]
Search 29 will show 29019
搜索查询:
{
"query": {
"term": {
"postal_codes": "29"
}
}
}
搜索结果:
"hits": [
{
"_index": "66778583",
"_type": "_doc",
"_id": "2",
"_score": 0.6931471,
"_source": {
"postal_codes": "29019"
}
}
]