ElasticSearch 按嵌套字段中的匹配数排序
ElasticSearch order by number of matches in nested fields
这里完全是新手,很可能正在尝试做不可能的事情。
我想将以下结构存储在 Elasticsearch 中:
{
"id" : 1,
"code" : "03f3301c-4089-11e7-a919-92ebcb67fe33",
"countries" : [
{
"id" : 1,
"name" : "Netherlands"
},
{
"id" : 2,
"name" : "United Kingdom"
}
],
"tags" : [
{
"id" : 1,
"name" : "Scanned"
},
{
"id" : 2,
"name" : "Secured"
},
{
"id" : 3,
"name" : "Cleared"
}
]
}
我可以完全控制它的存储方式,因此结构可以更改,但它应该以某种形式包含所有这些字段。
我希望能够通过 countries
和 tags
查询此数据,以便所有具有至少一个匹配项的项目都被 returned,按匹配项数排序。如果可能的话,我宁愿不进行全文搜索。
例如:
id, code, country ids, tag ids
1, ..., [1, 2, 3], [1]
2, ..., [1], [1, 2, 3]
对于问题:"which of these was in country 1 or has tag 1 or has tag 2"
,应该return:
2, ..., [1], [1, 2, 3]
1, ..., [1, 2, 3], [1]
按这个顺序,因为第二行在上面的disjunction中匹配了更多的子查询。
本质上,我想复制这个 SQL 查询:
SELECT p.id, p.code, COUNT(p.id) FROM packages p
LEFT JOIN tags t ON t.package_id = p.id
LEFT JOIN countries c ON c.package_id = p.id
WHERE t.id IN (1, 2, 3) OR c.id IN (1, 2, 3)
GROUP BY p.id
ORDER BY COUNT(p.id);
如果重要的话,我正在使用 ElasticSearch 2.4.5。
希望我说得够清楚。感谢您的帮助!
您需要 countries
和 tags
才能成为 nested
类型。此外,您需要使用 function_score
控制评分,为 function_score 内的查询提供 weight
的 1
并使用 boost_mode
和 score_mode
。最后你可以使用这个查询:
GET /nested/test/_search
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"filter": {
"nested": {
"path": "tags",
"query": {
"term": {
"tags.id": 1
}
}
}
},
"weight": 1
},
{
"filter": {
"nested": {
"path": "tags",
"query": {
"term": {
"tags.id": 2
}
}
}
},
"weight": 1
},
{
"filter": {
"nested": {
"path": "countries",
"query": {
"term": {
"countries.id": 1
}
}
}
},
"weight": 1
}
],
"boost_mode": "replace",
"score_mode": "sum"
}
}
}
为了更完整的测试用例,我也提供映射和测试数据:
PUT nested
{
"mappings": {
"test": {
"properties": {
"tags": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
}
}
},
"countries": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
POST nested/test/_bulk
{"index":{"_id":1}}
{"name":"Foo Bar","tags":[{"id":2,"name":"My Tag 5"},{"id":3,"name":"My Tag 7"}],"countries":[{"id":1,"name":"USA"}]}
{"index":{"_id":2}}
{"name":"Foo Bar","tags":[{"id":3,"name":"My Tag 6"}],"countries":[{"id":1,"name":"USA"},{"id":2,"name":"UK"},{"id":3,"name":"UAE"}]}
{"index":{"_id":3}}
{"name":"Foo Bar","tags":[{"id":1,"name":"My Tag 4"},{"id":3,"name":"My Tag 1"}],"countries":[{"id":3,"name":"UAE"}]}
{"index":{"_id":4}}
{"name":"Foo Bar","tags":[{"id":1,"name":"My Tag 1"},{"id":2,"name":"My Tag 4"},{"id":3,"name":"My Tag 2"}],"countries":[{"id":2,"name":"UK"},{"id":3,"name":"UAE"}]}
这里完全是新手,很可能正在尝试做不可能的事情。 我想将以下结构存储在 Elasticsearch 中:
{
"id" : 1,
"code" : "03f3301c-4089-11e7-a919-92ebcb67fe33",
"countries" : [
{
"id" : 1,
"name" : "Netherlands"
},
{
"id" : 2,
"name" : "United Kingdom"
}
],
"tags" : [
{
"id" : 1,
"name" : "Scanned"
},
{
"id" : 2,
"name" : "Secured"
},
{
"id" : 3,
"name" : "Cleared"
}
]
}
我可以完全控制它的存储方式,因此结构可以更改,但它应该以某种形式包含所有这些字段。
我希望能够通过 countries
和 tags
查询此数据,以便所有具有至少一个匹配项的项目都被 returned,按匹配项数排序。如果可能的话,我宁愿不进行全文搜索。
例如:
id, code, country ids, tag ids
1, ..., [1, 2, 3], [1]
2, ..., [1], [1, 2, 3]
对于问题:"which of these was in country 1 or has tag 1 or has tag 2"
,应该return:
2, ..., [1], [1, 2, 3]
1, ..., [1, 2, 3], [1]
按这个顺序,因为第二行在上面的disjunction中匹配了更多的子查询。
本质上,我想复制这个 SQL 查询:
SELECT p.id, p.code, COUNT(p.id) FROM packages p
LEFT JOIN tags t ON t.package_id = p.id
LEFT JOIN countries c ON c.package_id = p.id
WHERE t.id IN (1, 2, 3) OR c.id IN (1, 2, 3)
GROUP BY p.id
ORDER BY COUNT(p.id);
如果重要的话,我正在使用 ElasticSearch 2.4.5。
希望我说得够清楚。感谢您的帮助!
您需要 countries
和 tags
才能成为 nested
类型。此外,您需要使用 function_score
控制评分,为 function_score 内的查询提供 weight
的 1
并使用 boost_mode
和 score_mode
。最后你可以使用这个查询:
GET /nested/test/_search
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"filter": {
"nested": {
"path": "tags",
"query": {
"term": {
"tags.id": 1
}
}
}
},
"weight": 1
},
{
"filter": {
"nested": {
"path": "tags",
"query": {
"term": {
"tags.id": 2
}
}
}
},
"weight": 1
},
{
"filter": {
"nested": {
"path": "countries",
"query": {
"term": {
"countries.id": 1
}
}
}
},
"weight": 1
}
],
"boost_mode": "replace",
"score_mode": "sum"
}
}
}
为了更完整的测试用例,我也提供映射和测试数据:
PUT nested
{
"mappings": {
"test": {
"properties": {
"tags": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
}
}
},
"countries": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
POST nested/test/_bulk
{"index":{"_id":1}}
{"name":"Foo Bar","tags":[{"id":2,"name":"My Tag 5"},{"id":3,"name":"My Tag 7"}],"countries":[{"id":1,"name":"USA"}]}
{"index":{"_id":2}}
{"name":"Foo Bar","tags":[{"id":3,"name":"My Tag 6"}],"countries":[{"id":1,"name":"USA"},{"id":2,"name":"UK"},{"id":3,"name":"UAE"}]}
{"index":{"_id":3}}
{"name":"Foo Bar","tags":[{"id":1,"name":"My Tag 4"},{"id":3,"name":"My Tag 1"}],"countries":[{"id":3,"name":"UAE"}]}
{"index":{"_id":4}}
{"name":"Foo Bar","tags":[{"id":1,"name":"My Tag 1"},{"id":2,"name":"My Tag 4"},{"id":3,"name":"My Tag 2"}],"countries":[{"id":2,"name":"UK"},{"id":3,"name":"UAE"}]}