我怎样才能准确我的 ElasticSearch 查询以更好地区分它的结果?
How can I accurate my ElasticSeach query to distingue better its results?
我的挑战是创建一个自动完成字段(django 和 ES),我可以在其中搜索“apeni”、“rua apen”或“roa apen”,并将“rua apeninos”作为主要(或独特的) ) 选项。我已经在 ES 中尝试过 suggest 和 completion,但都使用前缀(不要使用“apen”)。我也尝试了通配符,但无法使用模糊(不适用于“roa apini”或“apini”)。所以,现在我正在用模糊匹配。
但即使查询词不同,如“rua ape”或“rua apot”,returns 相同的两个文档 street_desc 等于“rua apeninos”和“rua apotribu”并且都是 1.0.
查询:
{
"aggs":{
"addresses":{
"filters":{
"filters":{
"street":{
"match":{
"street_desc":{
"query":"rua ape",
"fuzziness":"AUTO",
"prefix_length":0,
"max_expansions":50
}
}
}
}
},
"aggs":{
"street_bucket":{
"significant_terms":{
"field":"street_desc.raw",
"size":3
}
}
}
}
},
"sort":[
{
"_score":{
"order":"desc"
}
}
]
}
索引:
{
"catalogs":{
"mappings":{
"properties":{
"street_desc":{
"type":"text",
"fields":{
"raw":{
"type":"keyword"
}
},
"analyzer":"suggest_analyzer"
}
}
}
}
}
分析器: (python)
suggest_analyzer = analyzer(
'suggest_analyzer',
tokenizer=tokenizer("lowercase"),
filter=[token_filter('stopbr', 'stop', stopwords="_brazilian_")],
language="brazilian",
char_filter=["html_strip"]
)
添加一个端到端的工作示例,我对所有给定的搜索词进行了测试。
索引映射
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}
索引示例文档
{
"title" : "rua apotribu"
}
{
"title" : "rua apeninos"
}
搜索查询
{
"query": {
"match": {
"title": {
"query": "apeni", //
"fuzziness":"AUTO"
}
}
}
}
和搜索结果
"hits": [
{
"_index": "64881760",
"_type": "_doc",
"_id": "1",
"_score": 1.1026623,
"_source": {
"title": "rua apeninos"
}
}
]
现在 apen
也提供搜索结果
"hits": [
{
"_index": "64881760",
"_type": "_doc",
"_id": "1",
"_score": 2.517861,
"_source": {
"title": "rua apeninos"
}
}
]
现在,当查询词不同时,如 rua apot
,它会将得分更高的两个文档都带到 rua apotribu
,如下面的搜索结果所示。
"hits": [
{
"_index": "64881760",
"_type": "_doc",
"_id": "2",
"_score": 2.9289336,
"_source": {
"title": "rua apotribu"
}
},
{
"_index": "64881760",
"_type": "_doc",
"_id": "1",
"_score": 0.41107285,
"_source": {
"title": "rua apeninos"
}
}
]
我的挑战是创建一个自动完成字段(django 和 ES),我可以在其中搜索“apeni”、“rua apen”或“roa apen”,并将“rua apeninos”作为主要(或独特的) ) 选项。我已经在 ES 中尝试过 suggest 和 completion,但都使用前缀(不要使用“apen”)。我也尝试了通配符,但无法使用模糊(不适用于“roa apini”或“apini”)。所以,现在我正在用模糊匹配。
但即使查询词不同,如“rua ape”或“rua apot”,returns 相同的两个文档 street_desc 等于“rua apeninos”和“rua apotribu”并且都是 1.0.
查询:
{
"aggs":{
"addresses":{
"filters":{
"filters":{
"street":{
"match":{
"street_desc":{
"query":"rua ape",
"fuzziness":"AUTO",
"prefix_length":0,
"max_expansions":50
}
}
}
}
},
"aggs":{
"street_bucket":{
"significant_terms":{
"field":"street_desc.raw",
"size":3
}
}
}
}
},
"sort":[
{
"_score":{
"order":"desc"
}
}
]
}
索引:
{
"catalogs":{
"mappings":{
"properties":{
"street_desc":{
"type":"text",
"fields":{
"raw":{
"type":"keyword"
}
},
"analyzer":"suggest_analyzer"
}
}
}
}
}
分析器: (python)
suggest_analyzer = analyzer(
'suggest_analyzer',
tokenizer=tokenizer("lowercase"),
filter=[token_filter('stopbr', 'stop', stopwords="_brazilian_")],
language="brazilian",
char_filter=["html_strip"]
)
添加一个端到端的工作示例,我对所有给定的搜索词进行了测试。
索引映射
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}
索引示例文档
{
"title" : "rua apotribu"
}
{
"title" : "rua apeninos"
}
搜索查询
{
"query": {
"match": {
"title": {
"query": "apeni", //
"fuzziness":"AUTO"
}
}
}
}
和搜索结果
"hits": [
{
"_index": "64881760",
"_type": "_doc",
"_id": "1",
"_score": 1.1026623,
"_source": {
"title": "rua apeninos"
}
}
]
现在 apen
也提供搜索结果
"hits": [
{
"_index": "64881760",
"_type": "_doc",
"_id": "1",
"_score": 2.517861,
"_source": {
"title": "rua apeninos"
}
}
]
现在,当查询词不同时,如 rua apot
,它会将得分更高的两个文档都带到 rua apotribu
,如下面的搜索结果所示。
"hits": [
{
"_index": "64881760",
"_type": "_doc",
"_id": "2",
"_score": 2.9289336,
"_source": {
"title": "rua apotribu"
}
},
{
"_index": "64881760",
"_type": "_doc",
"_id": "1",
"_score": 0.41107285,
"_source": {
"title": "rua apeninos"
}
}
]