Elasticsearch:查找包含不超过查询中术语的文档
Elasticsearch: find documents containing not more terms than in the query
如果我有证件:
1: { "name": "red yellow" }
2: { "name": "green yellow" }
我想用 "red brown yellow" 查询并获取文档 1。
我的意思是查询应该至少包含来自我的文档的术语,但可以包含更多。如果文档包含查询中没有的标记,则不应命中。
我该怎么做?反过来很容易...
首先,您必须将字段声明为 fielddata : true
才能在其上执行脚本:
PUT test
{
"mappings": {
"properties": {
"name": {
"type": "text",
"fielddata": true
}
}
}
}
然后,您可以使用查询脚本过滤结果:
POST test/_search
{
"query": {
"bool": {
"filter": {
"script": {
"script": {
"source": """
boolean res = true;
for (item in doc['name']) {
res = 'red brown yellow'.contains(item) && res;
}
return res;
""",
"lang": "painless"
}
}
},
"must": [
{
"match": {
"name": "red brown yellow"
}
}
]
}
}
}
请注意,文本字段上的字段数据可能会花费很多,如果 fou 可以将此字段作为关键字索引到数组中,则效果会更好,如下所示:
1: { "name": ["red","yellow"] }
2: { "name": ["green", "yellow"] }
搜索请求可以完全一样
The match query is of type boolean. It means that the text provided is
analyzed and the analysis process constructs a boolean query from the
provided text. The minimum number of optional should clauses to match
can be set using the minimum_should_match parameter.
要了解更多关于匹配查询,您可以参考ES documentation
下面是name
字段
的映射
{
"tests": {
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
现在,当您从以下查询中搜索 "red brown yellow"
时
POST tests/_search
{
"query": {
"match": {
"name": {
"query": "red brown yellow",
"minimum_should_match": "75%"
}
}
}
}
你得到了你想要的结果:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.87546873,
"hits": [
{
"_index": "tests",
"_type": "_doc",
"_id": "1",
"_score": 0.87546873,
"_source": {
"name": "red yellow"
}
}
]
}
}
输出将不包括 green yellow
。这是因为第二个文档,只匹配了 1/3 的查询词,低于 75%
如果我有证件:
1: { "name": "red yellow" }
2: { "name": "green yellow" }
我想用 "red brown yellow" 查询并获取文档 1。
我的意思是查询应该至少包含来自我的文档的术语,但可以包含更多。如果文档包含查询中没有的标记,则不应命中。
我该怎么做?反过来很容易...
首先,您必须将字段声明为 fielddata : true
才能在其上执行脚本:
PUT test
{
"mappings": {
"properties": {
"name": {
"type": "text",
"fielddata": true
}
}
}
}
然后,您可以使用查询脚本过滤结果:
POST test/_search
{
"query": {
"bool": {
"filter": {
"script": {
"script": {
"source": """
boolean res = true;
for (item in doc['name']) {
res = 'red brown yellow'.contains(item) && res;
}
return res;
""",
"lang": "painless"
}
}
},
"must": [
{
"match": {
"name": "red brown yellow"
}
}
]
}
}
}
请注意,文本字段上的字段数据可能会花费很多,如果 fou 可以将此字段作为关键字索引到数组中,则效果会更好,如下所示:
1: { "name": ["red","yellow"] }
2: { "name": ["green", "yellow"] }
搜索请求可以完全一样
The match query is of type boolean. It means that the text provided is analyzed and the analysis process constructs a boolean query from the provided text. The minimum number of optional should clauses to match can be set using the minimum_should_match parameter.
要了解更多关于匹配查询,您可以参考ES documentation
下面是name
字段
{
"tests": {
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
现在,当您从以下查询中搜索 "red brown yellow"
时
POST tests/_search
{
"query": {
"match": {
"name": {
"query": "red brown yellow",
"minimum_should_match": "75%"
}
}
}
}
你得到了你想要的结果:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.87546873,
"hits": [
{
"_index": "tests",
"_type": "_doc",
"_id": "1",
"_score": 0.87546873,
"_source": {
"name": "red yellow"
}
}
]
}
}
输出将不包括 green yellow
。这是因为第二个文档,只匹配了 1/3 的查询词,低于 75%