Elasticsearch 将文本字段添加到 mt 聚合
Elastic Search adding a text field to my aggregation
我在Elastic Search中有这样的文章信息:
{
"ArticleId":355027,
"ArticleNumber":"433398",
"CharacteristicsMultiValue":[
{
"Name":"Aantal cartridges",
"Value":"4",
"NumValue":4,
"Priority":2147483647
},
{
"Name":"ADF",
"Value":"Ja",
"Priority":10,
"Description":"Een Automatic Document Feeder (ADF), of automatische documentinvoer, laat een multifunctionele printer (all-in-one) automatisch meerdere vellen na elkaar verwerken. Door meerdere vellen in de ADF te plaatsen, wordt ieder vel papier stuk voor stuk automatisch gekopieerd of gescand."
},
{
"Name":"Scanresolutie",
"Value":"600x600 DPI",
"Priority":2147483647
}
]
}
我 运行 以下查询检索所有出现的 CharacteristicsMultiValue
用于我的搜索以及所有可能的值,并根据我的喜好对它们进行排序。
{
"query": {
"query_string": {
"query": "433398",
"default_operator": "and"
}
},
"aggs":{
"CharacteristicsMultiValue":{
"nested":{
"path":"CharacteristicsMultiValue"
},
"aggs":{
"Name":{
"terms":{
"field":"CharacteristicsMultiValue.Name",
"size":25
},
"aggs":{
"Value":{
"terms":{
"field":"CharacteristicsMultiValue.Value",
"size":25
}
},
"Priority":{
"avg":{
"field":"CharacteristicsMultiValue.Priority"
}
},
"Characteristics_sort": {
"bucket_sort": {
"sort": [
{ "Priority": { "order": "asc" } }
]
}
}
}
}
}
}
}
}
结果显示 CharacteristicsMultiValue
的列表,如下所示。
{
"key":"ADF",
"doc_count":1,
"Priority":{
"value":10
},
"Value":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[
{
"key":"Ja",
"doc_count":1
}
]
}
}
这一切都很好。我想进行更改,以便 CharacteristicsMultiValue.Description
字段包含在聚合中。我对 Elastic Search 不是很有经验,但我觉得我应该能够很轻松地做到这一点。
我做了一些研究,据我所知,我需要为描述列添加一个新的子聚合。我试图通过在几个地方将下面的 JSON 添加到我当前的查询中来做到这一点,但我不断收到 404
错误。谁能告诉我如何将(第一个找到的)描述字段添加到我的聚合中。
"aggs":{
"Description":{
"terms":{
"field":"CharacteristicsMultiValue.Description",
"size":1
}
}
}
我测试了乔提出的方案。这会导致以下错误响应:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "articles_dev1_nl",
"node": "HiGH6JY9QvOozRSWJmFXpw",
"reason": {
"type": "illegal_argument_exception",
"reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
}
}
},
"status": 400
}
我不知道你为什么会收到 404
错误 -- 如果你的聚合语法不正确,通常是 400 Bad Request
。
无论哪种方式,如果你想找到每个分桶 Value
下的前 Description
个术语,你可以使用:
{
"query": {
"query_string": {
"query": "433398",
"default_operator": "and"
}
},
"aggs": {
"CharacteristicsMultiValue": {
"nested": {
"path": "CharacteristicsMultiValue"
},
"aggs": {
"Name": {
"terms": {
"field": "CharacteristicsMultiValue.Name",
"size": 25
},
"aggs": {
"Value": {
"terms": {
"field": "CharacteristicsMultiValue.Value",
"size": 25
},
--> "aggs": {
"Description": {
"terms": {
"field": "CharacteristicsMultiValue.Description",
"size": 1
}
}
}
},
"Priority": {
"avg": {
"field": "CharacteristicsMultiValue.Priority"
}
},
"Characteristics_sort": {
"bucket_sort": {
"sort": [
{
"Priority": {
"order": "asc"
}
}
]
}
}
}
}
}
}
}
}
一般来说,sub-aggregations遵循以下模式:
{
"query": { }, // optional query
"aggs": {
"your_agg_name": {
"agg_type": {
// agg spec
},
"aggs": {
"your_sub_agg_name_1": {
"agg_type": {
// agg spec
}
},
"your_sub_agg_name_2_if_needed": {
"agg_type": {
// agg spec
}
},
...
}
}
}
}
你可以:
- 进一步嵌套 sub-aggs 就像您已经在使用我的示例
Name->Value
或 Value->Description
所做的那样
- 或者像
Name->Value
和 Name->Priority
. 一样将它们保持在 相同水平
提示:您的查询已经嵌套得很深,因此您可以探索 typed_keys
query parameter 以更轻松地确定哪个存储桶对应于哪个 sub-aggregation。
编辑
如错误消息中所述,Description
字段需要可聚合 在 执行任何聚合之前。
因此,如果您删除索引,则应打开 fielddata
:
PUT articles_dev1_nl
{
"mappings": {
"properties": {
"CharacteristicsMultiValue": {
"type": "nested",
"properties": {
.... other props ...
"Description": {
"type": "text",
"fielddata": true <---
}
}
}
}
}
}
或者,如果您的索引已经存在,您可以使用 update API:
PUT articles_dev1_nl/_mapping
{
"properties": {
"CharacteristicsMultiValue": {
"type": "nested",
"properties": {
"Description": {
"type": "text",
"fielddata": true
}
}
}
}
}
您可以详细了解 fielddata
与 keyword
here in the docs。
我在Elastic Search中有这样的文章信息:
{
"ArticleId":355027,
"ArticleNumber":"433398",
"CharacteristicsMultiValue":[
{
"Name":"Aantal cartridges",
"Value":"4",
"NumValue":4,
"Priority":2147483647
},
{
"Name":"ADF",
"Value":"Ja",
"Priority":10,
"Description":"Een Automatic Document Feeder (ADF), of automatische documentinvoer, laat een multifunctionele printer (all-in-one) automatisch meerdere vellen na elkaar verwerken. Door meerdere vellen in de ADF te plaatsen, wordt ieder vel papier stuk voor stuk automatisch gekopieerd of gescand."
},
{
"Name":"Scanresolutie",
"Value":"600x600 DPI",
"Priority":2147483647
}
]
}
我 运行 以下查询检索所有出现的 CharacteristicsMultiValue
用于我的搜索以及所有可能的值,并根据我的喜好对它们进行排序。
{
"query": {
"query_string": {
"query": "433398",
"default_operator": "and"
}
},
"aggs":{
"CharacteristicsMultiValue":{
"nested":{
"path":"CharacteristicsMultiValue"
},
"aggs":{
"Name":{
"terms":{
"field":"CharacteristicsMultiValue.Name",
"size":25
},
"aggs":{
"Value":{
"terms":{
"field":"CharacteristicsMultiValue.Value",
"size":25
}
},
"Priority":{
"avg":{
"field":"CharacteristicsMultiValue.Priority"
}
},
"Characteristics_sort": {
"bucket_sort": {
"sort": [
{ "Priority": { "order": "asc" } }
]
}
}
}
}
}
}
}
}
结果显示 CharacteristicsMultiValue
的列表,如下所示。
{
"key":"ADF",
"doc_count":1,
"Priority":{
"value":10
},
"Value":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[
{
"key":"Ja",
"doc_count":1
}
]
}
}
这一切都很好。我想进行更改,以便 CharacteristicsMultiValue.Description
字段包含在聚合中。我对 Elastic Search 不是很有经验,但我觉得我应该能够很轻松地做到这一点。
我做了一些研究,据我所知,我需要为描述列添加一个新的子聚合。我试图通过在几个地方将下面的 JSON 添加到我当前的查询中来做到这一点,但我不断收到 404
错误。谁能告诉我如何将(第一个找到的)描述字段添加到我的聚合中。
"aggs":{
"Description":{
"terms":{
"field":"CharacteristicsMultiValue.Description",
"size":1
}
}
}
我测试了乔提出的方案。这会导致以下错误响应:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "articles_dev1_nl",
"node": "HiGH6JY9QvOozRSWJmFXpw",
"reason": {
"type": "illegal_argument_exception",
"reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [CharacteristicsMultiValue.Description] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
}
}
},
"status": 400
}
我不知道你为什么会收到 404
错误 -- 如果你的聚合语法不正确,通常是 400 Bad Request
。
无论哪种方式,如果你想找到每个分桶 Value
下的前 Description
个术语,你可以使用:
{
"query": {
"query_string": {
"query": "433398",
"default_operator": "and"
}
},
"aggs": {
"CharacteristicsMultiValue": {
"nested": {
"path": "CharacteristicsMultiValue"
},
"aggs": {
"Name": {
"terms": {
"field": "CharacteristicsMultiValue.Name",
"size": 25
},
"aggs": {
"Value": {
"terms": {
"field": "CharacteristicsMultiValue.Value",
"size": 25
},
--> "aggs": {
"Description": {
"terms": {
"field": "CharacteristicsMultiValue.Description",
"size": 1
}
}
}
},
"Priority": {
"avg": {
"field": "CharacteristicsMultiValue.Priority"
}
},
"Characteristics_sort": {
"bucket_sort": {
"sort": [
{
"Priority": {
"order": "asc"
}
}
]
}
}
}
}
}
}
}
}
一般来说,sub-aggregations遵循以下模式:
{
"query": { }, // optional query
"aggs": {
"your_agg_name": {
"agg_type": {
// agg spec
},
"aggs": {
"your_sub_agg_name_1": {
"agg_type": {
// agg spec
}
},
"your_sub_agg_name_2_if_needed": {
"agg_type": {
// agg spec
}
},
...
}
}
}
}
你可以:
- 进一步嵌套 sub-aggs 就像您已经在使用我的示例
Name->Value
或Value->Description
所做的那样 - 或者像
Name->Value
和Name->Priority
. 一样将它们保持在 相同水平
提示:您的查询已经嵌套得很深,因此您可以探索 typed_keys
query parameter 以更轻松地确定哪个存储桶对应于哪个 sub-aggregation。
编辑
如错误消息中所述,Description
字段需要可聚合 在 执行任何聚合之前。
因此,如果您删除索引,则应打开 fielddata
:
PUT articles_dev1_nl
{
"mappings": {
"properties": {
"CharacteristicsMultiValue": {
"type": "nested",
"properties": {
.... other props ...
"Description": {
"type": "text",
"fielddata": true <---
}
}
}
}
}
}
或者,如果您的索引已经存在,您可以使用 update API:
PUT articles_dev1_nl/_mapping
{
"properties": {
"CharacteristicsMultiValue": {
"type": "nested",
"properties": {
"Description": {
"type": "text",
"fielddata": true
}
}
}
}
}
您可以详细了解 fielddata
与 keyword
here in the docs。