用于字段映射的 Elasticsearch 索引和搜索时间分析器不起作用
Elasticsearch index and search time analyzer for field mapping doesn't work
我是 elasticsearch 的新手,我想提供 "search as you type" 功能。要搜索的文本每个字段不超过 50 个字符。搜索应该找到包含搜索文本的所有文档。类似于 "wildcard term" à la '*query*'。但这是非常耗费成本的。
这就是为什么我尝试按照这篇文章的描述去做的原因https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.html。我的情况的唯一区别是我想使用 'n-gram' 分析器而不是 'edge n-gram' 分析器。
我创建了以下自定义分析器:
"settings": {
"index": {
"max_ngram_diff": "50",
[...]
"analysis": {
"filter": {
"3-50-grams-filter": {
"type": "ngram",
"min_gram": "3",
"max_gram": "50"
}
},
"analyzer": {
"index-3-50-grams-analyzer": {
"filter": [
"lowercase",
"3-50-grams-filter"
],
"type": "custom",
"tokenizer": "keyword"
},
"search-3-50-grams-analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "keyword"
}
我创建了以下映射:
"mappings": {
dynamic": "strict",
properties": {
"my-field": {
"type": "text",
"fields": {
"my-field": {
"type": "text",
"analyzer": "index-3-50-grams-analyzer",
"search_analyzer": "search-3-50-grams-analyzer"
},
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
Post 以下数据:
{
"my-field": "1107811#1OMAH0RN03D2"
}
将以下内容发送到分析-API:
{
"text" : "1107811#1OMAH0RN03D2",
"field" : "my-field"
}
得到以下结果:
{
"tokens": [
{
"token": "1107811",
"start_offset": 0,
"end_offset": 7,
"type": "<NUM>",
"position": 0
},
{
"token": "1omah0rn03d2",
"start_offset": 8,
"end_offset": 20,
"type": "<ALPHANUM>",
"position": 1
}
]
}
- 似乎 search_analyzer(虽然在字段映射中定义)不能自动工作
- 即使我在查询中指定 search_analyzer,我也没有得到预期的结果。
这样的查询找到文档:
"query": {
"match": {
"my-field": {
"query": "1OMAH0RN03D2"
}
}
}
...但是这样的查询不会(只是删除了第一个字符):
"query": {
"match": {
"my-field": {
"query": "OMAH0RN03D2"
}
}
}
...并且带有显式 search_analyzer 的查询也不会(如果我再删除一个字符):
"query": {
"match": {
"my-field": {
"query": "MAH0RN03D2",
"analyzer": "search-3-50-grams-analyzer"
}
}
}
有谁知道可能导致此行为的原因是什么?
不确定,但我使用您的示例文档和索引设置进行了尝试,它对我来说效果很好,下面是我所做的分步操作。
索引映射和设置
{
"settings": {
"index": {
"max_ngram_diff": "50",
"analysis": {
"filter": {
"3-50-grams-filter": {
"type": "ngram",
"min_gram": "3",
"max_gram": "50"
}
},
"analyzer": {
"index-3-50-grams-analyzer": {
"filter": [
"lowercase",
"3-50-grams-filter"
],
"type": "custom",
"tokenizer": "keyword"
},
"search-3-50-grams-analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
}
},
"mappings": {
"properties": {
"myfield": {
"type": "text",
"analyzer": "index-3-50-grams-analyzer",
"search_analyzer": "search-3-50-grams-analyzer"
}
}
}
}
索引示例文档
{
"myfield" : "1107811#1OMAH0RN03D2"
}
搜索查询
{
"query": {
"match": {
"myfield": {
"query": "OMAH0RN03D2"
}
}
}
}
搜索结果
"hits": [
{
"_index": "edgesearch",
"_type": "_doc",
"_id": "1",
"_score": 0.4848835,
"_source": {
"myfield": "1107811#1OMAH0RN03D2"
}
}
]
编辑:根据评论,OP 正在使用 multi field
并且分析器定义被分配到更深的级别,这导致了问题并将此信息包含在查询已解决的问题中。
我是 elasticsearch 的新手,我想提供 "search as you type" 功能。要搜索的文本每个字段不超过 50 个字符。搜索应该找到包含搜索文本的所有文档。类似于 "wildcard term" à la '*query*'。但这是非常耗费成本的。
这就是为什么我尝试按照这篇文章的描述去做的原因https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.html。我的情况的唯一区别是我想使用 'n-gram' 分析器而不是 'edge n-gram' 分析器。
我创建了以下自定义分析器:
"settings": {
"index": {
"max_ngram_diff": "50",
[...]
"analysis": {
"filter": {
"3-50-grams-filter": {
"type": "ngram",
"min_gram": "3",
"max_gram": "50"
}
},
"analyzer": {
"index-3-50-grams-analyzer": {
"filter": [
"lowercase",
"3-50-grams-filter"
],
"type": "custom",
"tokenizer": "keyword"
},
"search-3-50-grams-analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "keyword"
}
我创建了以下映射:
"mappings": {
dynamic": "strict",
properties": {
"my-field": {
"type": "text",
"fields": {
"my-field": {
"type": "text",
"analyzer": "index-3-50-grams-analyzer",
"search_analyzer": "search-3-50-grams-analyzer"
},
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
Post 以下数据:
{
"my-field": "1107811#1OMAH0RN03D2"
}
将以下内容发送到分析-API:
{
"text" : "1107811#1OMAH0RN03D2",
"field" : "my-field"
}
得到以下结果:
{
"tokens": [
{
"token": "1107811",
"start_offset": 0,
"end_offset": 7,
"type": "<NUM>",
"position": 0
},
{
"token": "1omah0rn03d2",
"start_offset": 8,
"end_offset": 20,
"type": "<ALPHANUM>",
"position": 1
}
]
}
- 似乎 search_analyzer(虽然在字段映射中定义)不能自动工作
- 即使我在查询中指定 search_analyzer,我也没有得到预期的结果。
这样的查询找到文档:
"query": {
"match": {
"my-field": {
"query": "1OMAH0RN03D2"
}
}
}
...但是这样的查询不会(只是删除了第一个字符):
"query": {
"match": {
"my-field": {
"query": "OMAH0RN03D2"
}
}
}
...并且带有显式 search_analyzer 的查询也不会(如果我再删除一个字符):
"query": {
"match": {
"my-field": {
"query": "MAH0RN03D2",
"analyzer": "search-3-50-grams-analyzer"
}
}
}
有谁知道可能导致此行为的原因是什么?
不确定,但我使用您的示例文档和索引设置进行了尝试,它对我来说效果很好,下面是我所做的分步操作。
索引映射和设置
{
"settings": {
"index": {
"max_ngram_diff": "50",
"analysis": {
"filter": {
"3-50-grams-filter": {
"type": "ngram",
"min_gram": "3",
"max_gram": "50"
}
},
"analyzer": {
"index-3-50-grams-analyzer": {
"filter": [
"lowercase",
"3-50-grams-filter"
],
"type": "custom",
"tokenizer": "keyword"
},
"search-3-50-grams-analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
}
},
"mappings": {
"properties": {
"myfield": {
"type": "text",
"analyzer": "index-3-50-grams-analyzer",
"search_analyzer": "search-3-50-grams-analyzer"
}
}
}
}
索引示例文档
{
"myfield" : "1107811#1OMAH0RN03D2"
}
搜索查询
{
"query": {
"match": {
"myfield": {
"query": "OMAH0RN03D2"
}
}
}
}
搜索结果
"hits": [
{
"_index": "edgesearch",
"_type": "_doc",
"_id": "1",
"_score": 0.4848835,
"_source": {
"myfield": "1107811#1OMAH0RN03D2"
}
}
]
编辑:根据评论,OP 正在使用 multi field
并且分析器定义被分配到更深的级别,这导致了问题并将此信息包含在查询已解决的问题中。