带有破折号的 Elasticsearch 模糊搜索短语
Elasticsearch fuzzy search phrase with dash
我正在尝试找到一种方法来索引具有 "In-N-Out Burger" 描述的文档并进行 "in n out" 或 "in and out" 或直接 "in-n-out" 和拥有 return "In-N-Out Burger" 文档。浏览文档时,我对如何在索引或搜索时处理破折号感到困惑。有什么建议吗?
我当前的设置和映射:
curl -XPUT http://localhost:9200/objects -d '{
"settings": {
"analysis": {
"analyzer": {
"lower": {
"type": "custom",
"tokenizer": "keyword",
"filter": [ "lowercase" ]
}
}
}
}
}'
curl -XPUT http://localhost:9200/objects/object/_mapping -d '{
"object" : {
"properties" : {
"objectDescription" : {
"type" : "string",
"fields" : {
"lower": {
"type": "string",
"analyzer": "lower"
}
}
},
"suggest" : {
"type" : "completion",
"analyzer" : "simple",
"search_analyzer" : "simple",
"payloads" : true
}
}
}
}'
当我使用您的设置创建索引并放置文档时,我没有发现任何问题:
curl -XPUT http://localhost:9200/objects/object/001 -d '{
"description": "In-N-Out Burger",
"name" : "first_document"
}'
然后试图找到它:
curl -XGET 'localhost:9200/objects/object/_search?q=in+and+out&pretty'
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.05038611,
"hits" : [ {
"_index" : "objects",
"_type" : "object",
"_id" : "001",
"_score" : 0.05038611,
"_source" : {
"description" : "In-N-Out Burger",
"name" : "first_document"
}
} ]
}
}
或
curl -XGET 'localhost:9200/objects/object/_search?pretty&q=in-n-out'
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.23252454,
"hits" : [ {
"_index" : "objects",
"_type" : "object",
"_id" : "001",
"_score" : 0.23252454,
"_source" : {
"description" : "In-N-Out Burger",
"name" : "first_document"
}
} ]
}
}
如你所见,可以找到。 Analyzer 使用“-”作为分隔符,并在索引文档和尝试查找文档时将词组分开。你可以看到这个作品:
curl -XGET 'localhost:9200/objects/_analyze?pretty=true' -d 'In-N-Out Burger'
{
"tokens" : [ {
"token" : "in",
"start_offset" : 0,
"end_offset" : 2,
"type" : "<ALPHANUM>",
"position" : 0
}, {
"token" : "n",
"start_offset" : 3,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "out",
"start_offset" : 5,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 2
}, {
"token" : "burger",
"start_offset" : 9,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 3
} ]
}
我正在尝试找到一种方法来索引具有 "In-N-Out Burger" 描述的文档并进行 "in n out" 或 "in and out" 或直接 "in-n-out" 和拥有 return "In-N-Out Burger" 文档。浏览文档时,我对如何在索引或搜索时处理破折号感到困惑。有什么建议吗?
我当前的设置和映射:
curl -XPUT http://localhost:9200/objects -d '{
"settings": {
"analysis": {
"analyzer": {
"lower": {
"type": "custom",
"tokenizer": "keyword",
"filter": [ "lowercase" ]
}
}
}
}
}'
curl -XPUT http://localhost:9200/objects/object/_mapping -d '{
"object" : {
"properties" : {
"objectDescription" : {
"type" : "string",
"fields" : {
"lower": {
"type": "string",
"analyzer": "lower"
}
}
},
"suggest" : {
"type" : "completion",
"analyzer" : "simple",
"search_analyzer" : "simple",
"payloads" : true
}
}
}
}'
当我使用您的设置创建索引并放置文档时,我没有发现任何问题:
curl -XPUT http://localhost:9200/objects/object/001 -d '{
"description": "In-N-Out Burger",
"name" : "first_document"
}'
然后试图找到它:
curl -XGET 'localhost:9200/objects/object/_search?q=in+and+out&pretty'
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.05038611,
"hits" : [ {
"_index" : "objects",
"_type" : "object",
"_id" : "001",
"_score" : 0.05038611,
"_source" : {
"description" : "In-N-Out Burger",
"name" : "first_document"
}
} ]
}
}
或
curl -XGET 'localhost:9200/objects/object/_search?pretty&q=in-n-out'
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.23252454,
"hits" : [ {
"_index" : "objects",
"_type" : "object",
"_id" : "001",
"_score" : 0.23252454,
"_source" : {
"description" : "In-N-Out Burger",
"name" : "first_document"
}
} ]
}
}
如你所见,可以找到。 Analyzer 使用“-”作为分隔符,并在索引文档和尝试查找文档时将词组分开。你可以看到这个作品:
curl -XGET 'localhost:9200/objects/_analyze?pretty=true' -d 'In-N-Out Burger'
{
"tokens" : [ {
"token" : "in",
"start_offset" : 0,
"end_offset" : 2,
"type" : "<ALPHANUM>",
"position" : 0
}, {
"token" : "n",
"start_offset" : 3,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "out",
"start_offset" : 5,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 2
}, {
"token" : "burger",
"start_offset" : 9,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 3
} ]
}