用于不包括某些字段的分面搜索的 Elasticsearch 聚合

Question

我有一家商店使用 elasticsearch 2.4 进行分面搜索。但目前现有的过滤器（产品属性）取自 mysql。我想使用 elasticsearch 聚合来做到这一点。但是我遇到了问题：我不需要聚合所有属性。

有什么：

部分映射：

...
'is_active' => [
    'type' => 'long',
    'index' => 'not_analyzed',
],
'category_id' => [
    'type' => 'long',
    'index' => 'not_analyzed',
],
'attrs' => [
    'properties' => [
        'attr_name' => ['type' => 'string', 'index'     => 'not_analyzed'],
        'value' => [
            'type' => 'string',
            'index' => 'analyzed',
            'analyzer' => 'attrs_analizer',
        ],
    ]
],
...

数据示例：

{
    "id": 1,
    "is_active": "1",
    "category_id": 189,
    ...
    "price": "48.00",
    "attrs": [
      {
        "attr_name": "Brand",
        "value": "TP-Link"
      },
      {
        "attr_name": "Model",
        "value": "TL-1"
      },
      {
        "attr_name": "Other",
        "value": "<div>Some text of 'Other' property<br><img src......><ul><li>......</ul></div>"
      }
    ]
  },
  {
    "id": 2,
    "is_active": "1",
    "category_id": 242,
    ...
    "price": "12.00",
    "attrs": [
      {
        "attr_name": "Brand",
        "value": "Lenovo"
      },
      {
        "attr_name": "Model",
        "value": "B570"
      },
      {
        "attr_name": "OS",
        "value": "Linux"
      },
      {
        "attr_name": "Other",
        "value": "<div>Some text of 'Other' property<br><img src......><ul><li>......</ul></div>"
      }
    ]
  },
  {
    "id": 3,
    "is_active": "1",
    "category_id": 242,
    ...
    "price": "24.00",
    "attrs": [
      {
        "attr_name": "Brand",
        "value": "Asus"
      },
      {
        "attr_name": "Model",
        "value": "QZ85"
      },
      {
        "attr_name": "OS",
        "value": "Windows"
      },
      {
        "attr_name": "Other",
        "value": "<div>Some text of 'Other' property<br><img src......><ul><li>......</ul></div>"
      }
    ]
  }

筛选商品时不使用"Model"、"Other"等属性，仅在商品页面显示。在其他属性（品牌、OS 和其他...）上，我想接收聚合。

当我尝试聚合 attrs.value 字段时，我当然会得到所有数据的聚合（包括大的 "Other" 字段，其中可能有很多 HTML ).

"aggs": {
    "facet_value": {
      "terms": {
        "field": "attrs.value",
        "size": 0
      }
    }
  }

如何排除"attrs.attr_name": ["Model", "Other"]?

更改映射对我来说是一个糟糕的解决方案，但如果不可避免，请告诉我该怎么做？我想我需要使 "attrs" 嵌套？

更新：

我想收到： 1. 产品在某个类别中的所有属性，除了我在我的系统设置中指出的那些（在这个例子中我将排除"Model"和"Other"）。 2.每个值附近的产品数量。

它应该是这样的：

对于类别 "Laptops"：

品牌：

联想 (18)
华硕 (19)
.....

OS:

Windows (19)
Linux (5)
...

对于"computer monitors"：

品牌：

三星 (18)
LG (19)
.....

分辨率：

1360x768 (19)
1920x1080 (22)
.....

是Terms Aggregation，我用这个来计算每个类别的产品数量。我尝试了 attrs.value，但我不知道如何排除 "attrs.value"，它指的是“attrs.attr_name": "Model" & "attrs.attr_name": "Other"。

UPD2:

在我的例子中，如果 map attrs 作为嵌套类型，索引的权重增加 30%。从 2700Mi 到 3510Mi。如果没有别的选择，我只好忍了。

Answer 1

您必须将第一个属性映射为 nested type and use nested aggregations。

PUT no_play
{
  "mappings": {
    "document_type" : {
      "properties": {
        "is_active" : {
          "type": "long"
        },
        "category_id" : {
          "type": "long"
        },
        "attrs" : {
          "type": "nested", 
          "properties": {
            "attr_name" : {
              "type" : "keyword"
            },
            "value" : {
              "type" : "keyword"
            }
          }
        }
      }
    }
  }
}


POST no_play/document_type
  {
    "id": 3,
    "is_active": "1",
    "category_id": 242,
    "price": "24.00",
    "attrs": [
      {
        "attr_name": "Brand",
        "value": "Asus"
      },
      {
        "attr_name": "Model",
        "value": "QZ85"
      },
      {
        "attr_name": "OS",
        "value": "<div>Some text of 'Other' property<br><img src......><ul><li>......</ul></div>"
      },
      {
        "attr_name": "Other",
        "value": "<div>Some text of 'Other' property<br><img src......><ul><li>......</ul></div>"
      }
    ]
  }

因为你没有提到你想如何聚合。

案例1）如果你想把属性算作个体。此指标为您提供术语出现次数。

POST no_play/_search
{
  "size": 0,
  "aggs": {
    "nested_aggregation_value": {
      "nested": {
        "path": "attrs"
      },
      "aggs": {
        "value_term": {
          "terms": {
            "field": "attrs.value",
            "size": 10
          }
        }
      }
    }
  }
}

POST no_play/_search
    {
      "size": 0,
      "aggs": {
        "nested_aggregation_value": {
          "nested": {
            "path": "attrs"
          },
          "aggs": {
            "value_term": {
              "terms": {
                "field": "attrs.value",
                "size": 10
              },
              "aggs": {
                "reverse_back_to_roots": {
                  "reverse_nested": {
                  }
                }
              }
            }
          }
        }
      }
    }

现在要获取具有 attrs 值的根文档的数量，您需要挂钩 reverse nested aggregation 以将聚合器上移到根文档的级别。

考虑以下文档。

{
    "id": 3,
    "is_active": "1",
    "category_id": 242,
    "price": "24.00",
    "attrs": [
      {
        "attr_name": "Brand",
        "value": "Asus"
      },
      {
        "attr_name": "Model",
        "value": "QZ85"
      },
      {
        "attr_name": "OS",
        "value": "repeated value"
      },
      {
        "attr_name": "Other",
        "value": "repeated value"
      }
    ]
  }

对于第一个查询，'repeated value' 的值计数将为 2，对于第二个查询，它将为 1

备注

这里是您如何进行过滤以排除

POST no_play/_search
{
    "size": 0,
    "aggs": {
        "nested_aggregation_value": {
            "nested": {
                "path": "attrs"
            },
            "aggs": {
                "filtered_results": {
                    "filter": {
                        "bool": {
                            "must_not": [{
                                "terms": {
                                    "attrs.attr_name": ["Model", "Brand"]
                                }
                            }]
                        }
                    },
                    "aggs": {
                        "value_term": {
                            "terms": {
                                "field": "attrs.value",
                                "size": 10
                            }
                        }
                    }
                }
            }
        }
    }
}


POST no_play/_search
 {
    "size": 0,
    "aggs": {
        "nested_aggregation_value": {
            "nested": {
                "path": "attrs"
            },
            "aggs": {
                "filtered_results": {
                    "filter": {
                        "bool": {
                            "must_not": [{
                                "terms": {
                                    "attrs.attr_name": ["Model", "Brand"]
                                }
                            }]
                        }
                    },
                    "aggs": {
                        "value_term": {
                            "terms": {
                                "field": "attrs.value",
                                "size": 10
                            },
                            "aggs": {
                                "reverse_back_to_roots": {
                                    "reverse_nested": {}
                                }
                            }
                        }
                    }
                }
            }
        }
    }
 }

谢谢

用于不包括某些字段的分面搜索的 Elasticsearch 聚合

Elasticsearch aggregations for faceted search excluding some fields

filter

aggregation

faceted-search

elasticsearch