将 ngrams 与 elasticsearch 一起使用时带回所有相关结果
Bring back all relevant results when using ngrams with elasticsearch
我用 ngrams 为我的 elasticsearch 索引建立了索引,以便快速进行模糊匹配和前缀搜索。我注意到,如果我在名称字段中搜索包含 "Bob" 的文档,只会得到 name = Bob return 的结果。我希望响应包括名称为 Bob 的文档,但 也 名称为 Bobbi、Bobbette 等的文档。 Bob 结果应该有一个相对较高的分数。其他不完全匹配的结果仍应出现在结果集中,但分数较低。我如何使用 ngrams 实现此目的?
我正在使用一个非常小的简单索引来测试。索引包含两个文档。
{
"_index": "contacts_4",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"full_name": "Bob Smith"
}
},
{
"_index": "contacts_4",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"full_name": "Bobby Smith"
}
}
这是一个工作示例(使用 n-gram 分词器):
映射
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"token_chars": [
"letter",
"digit"
],
"min_gram": "3",
"type": "ngram",
"max_gram": "4"
}
}
}
},
"mappings": {
"properties": {
"full_name": {
"type": "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
索引文档
POST my_index/_doc/1
{
"full_name":"Bob Smith"
}
POST my_index/_doc/2
{
"full_name":"Bobby Smith"
}
POST my_index/_doc/3
{
"full_name":"Bobbette Smith"
}
搜索查询
GET my_index/_search
{
"query": {
"match": {
"full_name": "Bob"
}
}
}
结果
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.1626403,
"_source" : {
"full_name" : "Bob Smith"
}
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.13703513,
"_source" : {
"full_name" : "Bobby Smith"
}
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.11085624,
"_source" : {
"full_name" : "Bobbette Smith"
}
}
]
希望对您有所帮助
我用 ngrams 为我的 elasticsearch 索引建立了索引,以便快速进行模糊匹配和前缀搜索。我注意到,如果我在名称字段中搜索包含 "Bob" 的文档,只会得到 name = Bob return 的结果。我希望响应包括名称为 Bob 的文档,但 也 名称为 Bobbi、Bobbette 等的文档。 Bob 结果应该有一个相对较高的分数。其他不完全匹配的结果仍应出现在结果集中,但分数较低。我如何使用 ngrams 实现此目的?
我正在使用一个非常小的简单索引来测试。索引包含两个文档。
{
"_index": "contacts_4",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"full_name": "Bob Smith"
}
},
{
"_index": "contacts_4",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"full_name": "Bobby Smith"
}
}
这是一个工作示例(使用 n-gram 分词器):
映射
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"token_chars": [
"letter",
"digit"
],
"min_gram": "3",
"type": "ngram",
"max_gram": "4"
}
}
}
},
"mappings": {
"properties": {
"full_name": {
"type": "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
索引文档
POST my_index/_doc/1
{
"full_name":"Bob Smith"
}
POST my_index/_doc/2
{
"full_name":"Bobby Smith"
}
POST my_index/_doc/3
{
"full_name":"Bobbette Smith"
}
搜索查询
GET my_index/_search
{
"query": {
"match": {
"full_name": "Bob"
}
}
}
结果
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.1626403,
"_source" : {
"full_name" : "Bob Smith"
}
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.13703513,
"_source" : {
"full_name" : "Bobby Smith"
}
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.11085624,
"_source" : {
"full_name" : "Bobbette Smith"
}
}
]
希望对您有所帮助