Elasticsearch中如何对文档进行分组,并获取每组中的文档?
How to group documents in Elasticsearch and get the documents in each group?
我的 Elasticsearch 索引包含与类别具有非规范化 m:n 关系的产品。
我的目标是从中导出包含相同信息但关系倒置的类别索引。
索引如下所示:
PUT /products
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"article_id": {
"type": "keyword"
},
"categories": {
"type": "nested",
"properties": {
"cat_name": {
"type": "keyword"
}
}
}
}
}
}
包含这样创建的文档:
POST /products/_doc
{
"name": "radio",
"article_id": "1001",
"categories": [
{ "cat_name": "audio" },
{ "cat_name": "electronics" }
]
}
POST /products/_doc
{
"name": "fridge",
"article_id": "1002",
"categories": [
{ "cat_name": "appliances" },
{ "cat_name": "electronics" }
]
}
我想从 Elasticsearch 得到类似这样的信息:
{
"name": "appliances",
"products": [
{
"name": "fridge",
"article_id": "1002"
}
]
},
{
"name": "audio",
"products": [
{
"name": "radio",
"article_id": "1001"
}
]
},
{
"name": "electronics",
"products": [
{
"name": "fridge",
"article_id": "1002"
},
{
"name": "radio",
"article_id": "1001"
}
]
}
最终将被放入索引中,例如:
PUT /categories
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"products": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
},
"article_id": {
"type": "keyword"
}
}
}
}
}
}
如果不以编程方式加载和分组所有产品,我不知道如何执行此操作。
这是我尝试过的:
字段上的桶聚合 categories.cat_name
这为我提供了每个类别的文档计数,但没有提供产品文档。使用 top_hits 子聚合似乎限制为 100 个文档。
组使用带扩展的折叠字段
只能在单值字段上折叠。
我正在使用 Elasticsearch 8.1。
您需要查询的是这个:
POST products/_search
{
"size": 0,
"aggs": {
"cats": {
"nested": {
"path": "categories"
},
"aggs": {
"categories": {
"terms": {
"field": "categories.cat_name",
"size": 10
},
"aggs": {
"root": {
"reverse_nested": {},
"aggs": {
"products": {
"terms": {
"field": "name",
"size": 10
}
}
}
}
}
}
}
}
}
}
它会生成您所需要的(减去文章 ID,但这很简单):
"buckets" : [
{
"key" : "electronics",
"doc_count" : 2,
"root" : {
"doc_count" : 2,
"products" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "fridge",
"doc_count" : 1
},
{
"key" : "radio",
"doc_count" : 1
}
]
}
}
},
{
"key" : "appliances",
"doc_count" : 1,
"root" : {
"doc_count" : 1,
"products" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "fridge",
"doc_count" : 1
}
]
}
}
},
{
"key" : "audio",
"doc_count" : 1,
"root" : {
"doc_count" : 1,
"products" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "radio",
"doc_count" : 1
}
]
}
}
}
]
我的 Elasticsearch 索引包含与类别具有非规范化 m:n 关系的产品。
我的目标是从中导出包含相同信息但关系倒置的类别索引。
索引如下所示:
PUT /products
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"article_id": {
"type": "keyword"
},
"categories": {
"type": "nested",
"properties": {
"cat_name": {
"type": "keyword"
}
}
}
}
}
}
包含这样创建的文档:
POST /products/_doc
{
"name": "radio",
"article_id": "1001",
"categories": [
{ "cat_name": "audio" },
{ "cat_name": "electronics" }
]
}
POST /products/_doc
{
"name": "fridge",
"article_id": "1002",
"categories": [
{ "cat_name": "appliances" },
{ "cat_name": "electronics" }
]
}
我想从 Elasticsearch 得到类似这样的信息:
{
"name": "appliances",
"products": [
{
"name": "fridge",
"article_id": "1002"
}
]
},
{
"name": "audio",
"products": [
{
"name": "radio",
"article_id": "1001"
}
]
},
{
"name": "electronics",
"products": [
{
"name": "fridge",
"article_id": "1002"
},
{
"name": "radio",
"article_id": "1001"
}
]
}
最终将被放入索引中,例如:
PUT /categories
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"products": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
},
"article_id": {
"type": "keyword"
}
}
}
}
}
}
如果不以编程方式加载和分组所有产品,我不知道如何执行此操作。 这是我尝试过的:
字段上的桶聚合 categories.cat_name
这为我提供了每个类别的文档计数,但没有提供产品文档。使用 top_hits 子聚合似乎限制为 100 个文档。
组使用带扩展的折叠字段
只能在单值字段上折叠。
我正在使用 Elasticsearch 8.1。
您需要查询的是这个:
POST products/_search
{
"size": 0,
"aggs": {
"cats": {
"nested": {
"path": "categories"
},
"aggs": {
"categories": {
"terms": {
"field": "categories.cat_name",
"size": 10
},
"aggs": {
"root": {
"reverse_nested": {},
"aggs": {
"products": {
"terms": {
"field": "name",
"size": 10
}
}
}
}
}
}
}
}
}
}
它会生成您所需要的(减去文章 ID,但这很简单):
"buckets" : [
{
"key" : "electronics",
"doc_count" : 2,
"root" : {
"doc_count" : 2,
"products" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "fridge",
"doc_count" : 1
},
{
"key" : "radio",
"doc_count" : 1
}
]
}
}
},
{
"key" : "appliances",
"doc_count" : 1,
"root" : {
"doc_count" : 1,
"products" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "fridge",
"doc_count" : 1
}
]
}
}
},
{
"key" : "audio",
"doc_count" : 1,
"root" : {
"doc_count" : 1,
"products" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "radio",
"doc_count" : 1
}
]
}
}
}
]