ElasticSearch 从查询自动完成功能开始
ElasticSearch starts with query for autocomplete feature
我想使用 ElasticSearch 和 C# 构建自动完成功能。但我没有得到想要的结果。出于演示目的,这就是我所做的。
1) 创建的索引名为 "names":
PUT names?pretty
2) 使用 POST 命令添加了 20 个条目:
POST names/_doc/1
{
"name" : "John Smith"
}
3) 姓名列表:
[ "John Smith", "John Smitha", "John Smithb", "John Smithc", "John Smithd", "John Smithe", "John Smithf",
"John Smithg", "John Smithh", "John Smithi", "Smith John", "Smitha John", "Smithb John", "Smithc John",
"Smithd John", "Smithe John", "Smithf John", "Smithg John", "Smithh John", "Smithi John",]
4) 当我运行一个前缀查询时:
GET names/_search
{
"query": {
"prefix": {
"name": {
"value": "Smith"
}
}
}
}
我希望回来"Smith John", "Smitha John"
...但我正在回来"John Smith", "John Smitha"
...
我做错了什么?我需要更改什么以及在哪里更改?
您正在将 name
字段定义为默认使用 ES 的 standard analyzer and converts the tokens to lowercase. You can test this by using the analyze API 的 text
字段。
关键字分析器的令牌示例
URL :- http://{{hostname}}:{{port}}/{{index}}/_analyze
{
"text": "John Smith",
"analyzer" : "keyword"
}
上面的输出API
{
"tokens": [
{
"token": "John Smith",
"start_offset": 0,
"end_offset": 10,
"type": "word",
"position": 0
}
]
}
请注意,它不会破坏 text
并按照 official ES doc 中的解释存储它。
带有标准分析器的代币
{
"text": "Smith John",
"analyzer" : "standard"
}
上面API的输出:
{
"tokens": [
{
"token": "john",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "smith",
"start_offset": 5,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 1
}
]
}
现在,当前缀查询未被分析并将其原样发送到 ES 时,因此 Smith
带有资本 S
的通知将被发送到 ES 以进行令牌匹配,现在具有更新的映射,只有以 Smith
开头的文档才会有该前缀,并且只有这些才会出现在搜索结果中。
映射
{
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "keyword"
}
}
}
}
搜索查询
{
"query": {
"prefix": {
"name": {
"value": "Smith"
}
}
}
}
编辑: :- ** 根据 OP 评论和上述设置和搜索查询更新了设置,它只获得以 Smith
开头的结果如下图output
{
"took": 811,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "59977669",
"_type": "_doc",
"_id": "6",
"_score": 1.0,
"_source": {
"name": "Smith John"
}
},
{
"_index": "59977669",
"_type": "_doc",
"_id": "7",
"_score": 1.0,
"_source": {
"name": "Smithb John"
}
},
{
"_index": "59977669",
"_type": "_doc",
"_id": "8",
"_score": 1.0,
"_source": {
"name": "Smithc John"
}
},
{
"_index": "59977669",
"_type": "_doc",
"_id": "9",
"_score": 1.0,
"_source": {
"name": "Smithd John"
}
},
{
"_index": "59977669",
"_type": "_doc",
"_id": "10",
"_score": 1.0,
"_source": {
"name": "Smithe John"
}
}
]
}
}
您需要 运行 在 name.keyword
字段而不是 name
字段上进行前缀查询。
GET names/_search
{
"query": {
"prefix": {
"name.keyword": {
"value": "Smith"
}
}
}
}
原因是 name.keyword
字段属于 type keyword
and is not analyzed (i.e. one token John Smith
is indexed) and hence you can perform and exact match query on it. The name
field is of type text
并被分析(即两个标记 john
和 smith
被编入索引)因此您的精确匹配(或前缀匹配)查询不起作用。
您可以在此处阅读更多相关信息
我想使用 ElasticSearch 和 C# 构建自动完成功能。但我没有得到想要的结果。出于演示目的,这就是我所做的。
1) 创建的索引名为 "names":
PUT names?pretty
2) 使用 POST 命令添加了 20 个条目:
POST names/_doc/1
{
"name" : "John Smith"
}
3) 姓名列表:
[ "John Smith", "John Smitha", "John Smithb", "John Smithc", "John Smithd", "John Smithe", "John Smithf",
"John Smithg", "John Smithh", "John Smithi", "Smith John", "Smitha John", "Smithb John", "Smithc John",
"Smithd John", "Smithe John", "Smithf John", "Smithg John", "Smithh John", "Smithi John",]
4) 当我运行一个前缀查询时:
GET names/_search
{
"query": {
"prefix": {
"name": {
"value": "Smith"
}
}
}
}
我希望回来"Smith John", "Smitha John"
...但我正在回来"John Smith", "John Smitha"
...
我做错了什么?我需要更改什么以及在哪里更改?
您正在将 name
字段定义为默认使用 ES 的 standard analyzer and converts the tokens to lowercase. You can test this by using the analyze API 的 text
字段。
关键字分析器的令牌示例
URL :- http://{{hostname}}:{{port}}/{{index}}/_analyze
{
"text": "John Smith",
"analyzer" : "keyword"
}
上面的输出API
{
"tokens": [
{
"token": "John Smith",
"start_offset": 0,
"end_offset": 10,
"type": "word",
"position": 0
}
]
}
请注意,它不会破坏 text
并按照 official ES doc 中的解释存储它。
带有标准分析器的代币
{
"text": "Smith John",
"analyzer" : "standard"
}
上面API的输出:
{
"tokens": [
{
"token": "john",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "smith",
"start_offset": 5,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 1
}
]
}
现在,当前缀查询未被分析并将其原样发送到 ES 时,因此 Smith
带有资本 S
的通知将被发送到 ES 以进行令牌匹配,现在具有更新的映射,只有以 Smith
开头的文档才会有该前缀,并且只有这些才会出现在搜索结果中。
映射
{
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "keyword"
}
}
}
}
搜索查询
{
"query": {
"prefix": {
"name": {
"value": "Smith"
}
}
}
}
编辑: :- ** 根据 OP 评论和上述设置和搜索查询更新了设置,它只获得以 Smith
开头的结果如下图output
{
"took": 811,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "59977669",
"_type": "_doc",
"_id": "6",
"_score": 1.0,
"_source": {
"name": "Smith John"
}
},
{
"_index": "59977669",
"_type": "_doc",
"_id": "7",
"_score": 1.0,
"_source": {
"name": "Smithb John"
}
},
{
"_index": "59977669",
"_type": "_doc",
"_id": "8",
"_score": 1.0,
"_source": {
"name": "Smithc John"
}
},
{
"_index": "59977669",
"_type": "_doc",
"_id": "9",
"_score": 1.0,
"_source": {
"name": "Smithd John"
}
},
{
"_index": "59977669",
"_type": "_doc",
"_id": "10",
"_score": 1.0,
"_source": {
"name": "Smithe John"
}
}
]
}
}
您需要 运行 在 name.keyword
字段而不是 name
字段上进行前缀查询。
GET names/_search
{
"query": {
"prefix": {
"name.keyword": {
"value": "Smith"
}
}
}
}
原因是 name.keyword
字段属于 type keyword
and is not analyzed (i.e. one token John Smith
is indexed) and hence you can perform and exact match query on it. The name
field is of type text
并被分析(即两个标记 john
和 smith
被编入索引)因此您的精确匹配(或前缀匹配)查询不起作用。
您可以在此处阅读更多相关信息