Elastic Search - 应用适当的分析器以获得准确的结果
Elastic Search - Apply appropriate analyser to accurate result
我是 Elastic Search 的新手。我想应用满足以下搜索的任何分析器。
让我们举个例子。
假设我在文档中输入了以下文本
- 我正在走路
- 我步行去了艾哈迈达巴德
- 每天早上散步
- 阿尼尔晚上散步。
- 我正在招聘候选人
- 我聘请了候选人
- 我每天都在招聘候选人
- 他聘用候选人
现在当我用
搜索时
- 文本“步行”
结果应该是 [walking, walked, walk, walks]
- 文字“走过”
结果应该是 [walking, walked, walk, walks]
- 文本“步行”
结果应该是 [walking, walked, walk, walks]
- 文本“行走”
结果应该是 [walking, walked, walk, walks]
同样的结果也应该租用。
- 文本“招聘”
结果应该是 [hiring, hired, hire, hires]
- 文本“已雇用”
结果应该是 [hiring, hired, hire, hires]
- 文本“雇用”
结果应该是 [hiring, hired, hire, hires]
- 文本“雇用”
结果应该是 [hiring, hired, hire, hires]
谢谢,
您需要使用stemmer token filter
Stemming is the process of reducing a word to its root form. This ensures variants of a word match during a search.
For example, walking and walked can be stemmed to the same root word:
walk. Once stemmed, an occurrence of either word would match the other
in a search.
映射
PUT index36
{
"mappings": {
"properties": {
"title":{
"type": "text",
"analyzer": "my_analyzer"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "whitespace",
"filter": [ "stemmer" ,"lowercase"]
}
}
}
}
}
分析
GET index36/_analyze
{
"text": ["walking", "walked", "walk", "walks"],
"analyzer": "my_analyzer"
}
结果
{
"tokens" : [
{
"token" : "walk",
"start_offset" : 0,
"end_offset" : 7,
"type" : "word",
"position" : 0
},
{
"token" : "walk",
"start_offset" : 8,
"end_offset" : 14,
"type" : "word",
"position" : 101
},
{
"token" : "walk",
"start_offset" : 15,
"end_offset" : 19,
"type" : "word",
"position" : 202
},
{
"token" : "walk",
"start_offset" : 20,
"end_offset" : 25,
"type" : "word",
"position" : 303
}
]
}
这四个词都产生相同的标记“walk”。因此,这些词中的任何一个都会在搜索中与另一个匹配。
您要搜索的是语言分析器,请参阅文档here
一个单词分析器总是由一个单词分词器和一个单词过滤器组成,如下例所示。
PUT /english_example
{
"settings": {
"analysis": {
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"english_keywords": {
"type": "keyword_marker",
"keywords": ["example"]
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
}
},
"analyzer": {
"rebuilt_english": {
"tokenizer": "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_keywords",
"english_stemmer"
]
}
}
}
}
}
您现在可以像这样在索引映射中使用分析器:
{ mappings": {
"myindex": {
"properties": {
"myField": {
"type": "keyword",
"analyzer": "rebuilt_english"
}
}
}
}
}
记得使用匹配查询才能查询全文。
我是 Elastic Search 的新手。我想应用满足以下搜索的任何分析器。 让我们举个例子。 假设我在文档中输入了以下文本
- 我正在走路
- 我步行去了艾哈迈达巴德
- 每天早上散步
- 阿尼尔晚上散步。
- 我正在招聘候选人
- 我聘请了候选人
- 我每天都在招聘候选人
- 他聘用候选人
现在当我用
搜索时- 文本“步行” 结果应该是 [walking, walked, walk, walks]
- 文字“走过” 结果应该是 [walking, walked, walk, walks]
- 文本“步行” 结果应该是 [walking, walked, walk, walks]
- 文本“行走” 结果应该是 [walking, walked, walk, walks]
同样的结果也应该租用。
- 文本“招聘” 结果应该是 [hiring, hired, hire, hires]
- 文本“已雇用” 结果应该是 [hiring, hired, hire, hires]
- 文本“雇用” 结果应该是 [hiring, hired, hire, hires]
- 文本“雇用” 结果应该是 [hiring, hired, hire, hires]
谢谢,
您需要使用stemmer token filter
Stemming is the process of reducing a word to its root form. This ensures variants of a word match during a search.
For example, walking and walked can be stemmed to the same root word: walk. Once stemmed, an occurrence of either word would match the other in a search.
映射
PUT index36
{
"mappings": {
"properties": {
"title":{
"type": "text",
"analyzer": "my_analyzer"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "whitespace",
"filter": [ "stemmer" ,"lowercase"]
}
}
}
}
}
分析
GET index36/_analyze
{
"text": ["walking", "walked", "walk", "walks"],
"analyzer": "my_analyzer"
}
结果
{
"tokens" : [
{
"token" : "walk",
"start_offset" : 0,
"end_offset" : 7,
"type" : "word",
"position" : 0
},
{
"token" : "walk",
"start_offset" : 8,
"end_offset" : 14,
"type" : "word",
"position" : 101
},
{
"token" : "walk",
"start_offset" : 15,
"end_offset" : 19,
"type" : "word",
"position" : 202
},
{
"token" : "walk",
"start_offset" : 20,
"end_offset" : 25,
"type" : "word",
"position" : 303
}
]
}
这四个词都产生相同的标记“walk”。因此,这些词中的任何一个都会在搜索中与另一个匹配。
您要搜索的是语言分析器,请参阅文档here
一个单词分析器总是由一个单词分词器和一个单词过滤器组成,如下例所示。
PUT /english_example
{
"settings": {
"analysis": {
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"english_keywords": {
"type": "keyword_marker",
"keywords": ["example"]
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
}
},
"analyzer": {
"rebuilt_english": {
"tokenizer": "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_keywords",
"english_stemmer"
]
}
}
}
}
}
您现在可以像这样在索引映射中使用分析器:
{ mappings": {
"myindex": {
"properties": {
"myField": {
"type": "keyword",
"analyzer": "rebuilt_english"
}
}
}
}
}
记得使用匹配查询才能查询全文。