弹性搜索 More like this query with filter is adding results
Elastic search More like this query with filter is adding results
我有以下类型定义 "taggeable":
{
"mappings": {
"taggeable" : {
"_all" : {"enabled" : false},
"properties" : {
"category" : {
"type" : "string"
},
"tags" : {
"type" : "string",
"term_vector" : "yes"
}
}
}
}
}
我还有这5个文件:
Document1 (tags: "t1 t2", category: "cat1")
Document2 (tags: "t1" , category: "cat1")
Document3 (tags: "t1 t3", category: "cat1")
Document4 (tags: "t4" , category: "cat1")
Document5 (tags: "t4" , category: "cat2")
以下查询:
{
"query": {
"more_like_this" : {
"fields" : ["tags"],
"like" : ["t1", "t2"],
"min_term_freq" : 1,
"min_doc_freq": 1
}
}
}
返回:
Document1 (tags: "t1 t2", category: "cat1")
Document2 ("t1", category: "cat1")
Document3 ("t1 t3", category: "cat1")
这是正确的,但是这个查询:
{
"query": {
"filtered": {
"query": {
"more_like_this" : {
"fields" : ["tags"],
"like" : ["t1", "t2"],
"min_term_freq" : 1,
"min_doc_freq": 1
},
"filter": {
"bool": {
"must": [
{"match": { "category": "cat1"}}
]
}
}
}
}
}
返回:
Document1 (tags: "t1 t2", category: "cat1")
Document4 (tags: "t4" , category: "cat1")
Document2 (tags: "t1" , category: "cat1")
Document3 (tags: "t1 t3", category: "cat1")
也就是说,Document4现在也被检索出来了,它的分数和Documen1差不多,完美匹配,即使Document4没有包含任何词在"t1 t2"。
有人知道发生了什么事吗?我正在使用 Elastic Search 2.4.6
提前致谢
这是一个很好的例子,说明为什么一致的缩进很重要。在这里,我用一致的缩进修改了你发布的内容,问题更加明显(JSONLint 是一个方便的工具,如果你没有使用对此有帮助的编辑器):
{
"query": {
"filtered": {
"query": {
"more_like_this": {
"fields": ["tags"],
"like": ["t1", "t2"],
"min_term_freq": 1,
"min_doc_freq": 1
},
"filter": {
"bool": {
"must": [{
"match": {
"category": "cat1"
}
}]
}
}
}
}
}
您的过滤器是 "query" 的子项,而不是 "filtered" 的子项。
尽管如此,您不应该使用过滤,它已被弃用,see here。您应该将其更改为布尔值,如此处指定的那样。
我有以下类型定义 "taggeable":
{
"mappings": {
"taggeable" : {
"_all" : {"enabled" : false},
"properties" : {
"category" : {
"type" : "string"
},
"tags" : {
"type" : "string",
"term_vector" : "yes"
}
}
}
}
}
我还有这5个文件:
Document1 (tags: "t1 t2", category: "cat1")
Document2 (tags: "t1" , category: "cat1")
Document3 (tags: "t1 t3", category: "cat1")
Document4 (tags: "t4" , category: "cat1")
Document5 (tags: "t4" , category: "cat2")
以下查询:
{
"query": {
"more_like_this" : {
"fields" : ["tags"],
"like" : ["t1", "t2"],
"min_term_freq" : 1,
"min_doc_freq": 1
}
}
}
返回:
Document1 (tags: "t1 t2", category: "cat1")
Document2 ("t1", category: "cat1")
Document3 ("t1 t3", category: "cat1")
这是正确的,但是这个查询:
{
"query": {
"filtered": {
"query": {
"more_like_this" : {
"fields" : ["tags"],
"like" : ["t1", "t2"],
"min_term_freq" : 1,
"min_doc_freq": 1
},
"filter": {
"bool": {
"must": [
{"match": { "category": "cat1"}}
]
}
}
}
} }
返回:
Document1 (tags: "t1 t2", category: "cat1")
Document4 (tags: "t4" , category: "cat1")
Document2 (tags: "t1" , category: "cat1")
Document3 (tags: "t1 t3", category: "cat1")
也就是说,Document4现在也被检索出来了,它的分数和Documen1差不多,完美匹配,即使Document4没有包含任何词在"t1 t2"。
有人知道发生了什么事吗?我正在使用 Elastic Search 2.4.6
提前致谢
这是一个很好的例子,说明为什么一致的缩进很重要。在这里,我用一致的缩进修改了你发布的内容,问题更加明显(JSONLint 是一个方便的工具,如果你没有使用对此有帮助的编辑器):
{
"query": {
"filtered": {
"query": {
"more_like_this": {
"fields": ["tags"],
"like": ["t1", "t2"],
"min_term_freq": 1,
"min_doc_freq": 1
},
"filter": {
"bool": {
"must": [{
"match": {
"category": "cat1"
}
}]
}
}
}
}
}
您的过滤器是 "query" 的子项,而不是 "filtered" 的子项。
尽管如此,您不应该使用过滤,它已被弃用,see here。您应该将其更改为布尔值,如此处指定的那样。