edge_ngram 过滤但未分析以匹配搜索
edge_ngram filter and not analzyed to match search
我有以下弹性搜索配置:
PUT /my_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
},
"snow_filter" : {
"type" : "snowball",
"language" : "English"
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"snow_filter",
"autocomplete_filter"
]
}
}
}
}
}
PUT /my_index/_mapping/my_type
{
"my_type": {
"properties": {
"name": {
"type": "multi_field",
"fields": {
"name": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "snowball"
},
"not": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
POST /my_index/my_type/_bulk
{ "index": { "_id": 1 }}
{ "name": "Brown foxes" }
{ "index": { "_id": 2 }}
{ "name": "Yellow furballs" }
{ "index": { "_id": 3 }}
{ "name": "my discovery" }
{ "index": { "_id": 4 }}
{ "name": "myself is fun" }
{ "index": { "_id": 5 }}
{ "name": ["foxy", "foo"] }
{ "index": { "_id": 6 }}
{ "name": ["foo bar", "baz"] }
我正在尝试仅搜索名称为 "foo bar" 的 return 项目 6,但我不太确定如何搜索。这就是我现在正在做的事情:
GET /my_index/my_type/_search
{
"query": {
"match": {
"name": {
"query": "foo b"
}
}
}
}
我知道这是分词器如何拆分单词的组合,但有点迷失了如何既灵活又严格以匹配它。我猜我需要在我的名称映射上做一个多字段,但我不确定。如何修复查询 and/or 我的映射以满足我的需要?
你已经很接近了。由于您的 edge_ngram
分析器生成的标记的最小长度为 1,并且您的查询被标记为 "foo"
和 "b"
,并且默认的 match query operator 是 "or"
,您的查询匹配每个具有以 "b"
(或 "foo"
)开头的术语的文档,其中三个文档。
使用 "and"
运算符似乎可以满足您的要求:
POST /my_index/my_type/_search
{
"query": {
"match": {
"name": {
"query": "foo b",
"operator": "and"
}
}
}
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.4451914,
"hits": [
{
"_index": "test_index",
"_type": "my_type",
"_id": "6",
"_score": 1.4451914,
"_source": {
"name": [
"foo bar",
"baz"
]
}
}
]
}
}
这是我用来测试它的代码:
http://sense.qbox.io/gist/4f6fb7c1fdc6942023091ee1433d7490e04e7dea
我有以下弹性搜索配置:
PUT /my_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
},
"snow_filter" : {
"type" : "snowball",
"language" : "English"
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"snow_filter",
"autocomplete_filter"
]
}
}
}
}
}
PUT /my_index/_mapping/my_type
{
"my_type": {
"properties": {
"name": {
"type": "multi_field",
"fields": {
"name": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "snowball"
},
"not": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
POST /my_index/my_type/_bulk
{ "index": { "_id": 1 }}
{ "name": "Brown foxes" }
{ "index": { "_id": 2 }}
{ "name": "Yellow furballs" }
{ "index": { "_id": 3 }}
{ "name": "my discovery" }
{ "index": { "_id": 4 }}
{ "name": "myself is fun" }
{ "index": { "_id": 5 }}
{ "name": ["foxy", "foo"] }
{ "index": { "_id": 6 }}
{ "name": ["foo bar", "baz"] }
我正在尝试仅搜索名称为 "foo bar" 的 return 项目 6,但我不太确定如何搜索。这就是我现在正在做的事情:
GET /my_index/my_type/_search
{
"query": {
"match": {
"name": {
"query": "foo b"
}
}
}
}
我知道这是分词器如何拆分单词的组合,但有点迷失了如何既灵活又严格以匹配它。我猜我需要在我的名称映射上做一个多字段,但我不确定。如何修复查询 and/or 我的映射以满足我的需要?
你已经很接近了。由于您的 edge_ngram
分析器生成的标记的最小长度为 1,并且您的查询被标记为 "foo"
和 "b"
,并且默认的 match query operator 是 "or"
,您的查询匹配每个具有以 "b"
(或 "foo"
)开头的术语的文档,其中三个文档。
使用 "and"
运算符似乎可以满足您的要求:
POST /my_index/my_type/_search
{
"query": {
"match": {
"name": {
"query": "foo b",
"operator": "and"
}
}
}
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.4451914,
"hits": [
{
"_index": "test_index",
"_type": "my_type",
"_id": "6",
"_score": 1.4451914,
"_source": {
"name": [
"foo bar",
"baz"
]
}
}
]
}
}
这是我用来测试它的代码:
http://sense.qbox.io/gist/4f6fb7c1fdc6942023091ee1433d7490e04e7dea