如何匹配多个输入elasticsearch
how to match with multiple inputs elasticearch
我正在尝试使用以下查询在 3 个环境(开发、测试、生产)上查询所有可能的日志 terms
:已尝试 must
和 should
。
curl -vs -o -X POST http://localhost:9200/*/_search?pretty=true -d '
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": {
"terms": {
"can.deployment": ["can-prod", "can-test", "can-dev"]
}
"filter": [{
"range": {
"@timestamp": {
"gte": "2020-05-02T17:22:29.069Z",
"lt": "2020-05-23T17:23:29.069Z"
}
}
}, {
"terms": {
"can.level": ["WARN", "ERROR"]
}
}, {
"terms": {
"can.class": ["MTMessage", "ParserService", "JsonParser"]
}
}]
}
}
}'
给出:
{
"took" : 871,
"timed_out" : false,
"_shards" : {
"total" : 391,
"successful" : 389,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
但是,如果我将 terms
替换为 match
,它会起作用。但无法查询其他输入,例如查询 WARN 消息、与 ParserService 相关的查询日志 class 等:
curl -vs -o -X POST http://localhost:9200/*/_search?pretty=true -d '
{
"query": {
"bool": {
"should":
[{"match": {"can.deployment": "can-prod"}}],
"filter": [{
"range": {
"@timestamp": {
"gte": "2020-03-20T17:22:29.069Z",
"lt": "2020-05-01T17:23:29.069Z"
}
}
},{
"match": {
"can.level": "ERROR"
}
},{
"match": {
"can.class": "MTMessage"
}
}
]
}
}
}'
使用或不使用 terms/match 我如何完成此操作?
试过这个,没有运气。我得到 0 个搜索结果:
"match": {
"can.level": "ERROR"
}
},{
"match": {
"can.level": "WARN"
}
},{
"match": {
"can.class": "MTMessage"
}
}
任何提示肯定会有帮助。 TIA!
[编辑]
添加映射 (/_mapping?pretty=true):
"can" : {
"properties" : {
"class" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"deployment" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"level" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
添加示例文档:
{
"took" : 50,
"timed_out" : false,
"_shards" : {
"total" : 391,
"successful" : 387,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 5.44714,
"hits" : [
{
"_index" : "filebeat-6.1.2-2020.05.21",
"_type" : "doc",
"_id" : "AXI9K_cggA4T9jvjZc03",
"_score" : 5.44714,
"_source" : {
"@timestamp" : "2020-05-21T02:59:25.373Z",
"offset" : 34395681,
"beat" : {
"hostname" : "4c80d1588455-661e-7054-a4e5-73c821d7",
"name" : "4c80d1588455-661e-7054-a4e5-73c821d7",
"version" : "6.1.2"
},
"prospector" : {
"type" : "log"
},
"source" : "/var/logs/packages/gateway_mt/1a27957180c2b57a53e76dd686a06f4983bf233f/logs/gateway_mt.log",
"message" : "[2020-05-21 02:59:25.373] ERROR can_gateway_mt [ActiveMT SNAP Worker 18253] --- ClientIdAuthenticationFilter: Cannot authorize publishing from client ThingPayload_4
325334a89c9 : not authorized",
"fileset" : {
"module" : "can",
"name" : "services"
},
"fields" : { },
"can" : {
"component" : "can_gateway_mt",
"instancename" : "canservices/0",
"level" : "ERROR",
"thread" : "ActiveMT SNAP Worker 18253",
"message" : "Cannot authorize publishing from client ThingPayload_4325334a89c9 : not authorized",
"class" : "ClientIdAuthenticationFilter",
"timestamp" : "2020-05-21 02:59:25.373",
"deployment" : "can-prod"
}
}
}
]
}
}
预期输出:
试图获取符合条件的整个文档的转储。类似于上面的示例文档。
"query": {
"bool": {
"minimum_should_match": 1,
"should": {
"terms": {
"can.deployment": ["can-prod", "can-test", "can-dev"]
}
"filter": [{
"range": {
"@timestamp": {
"gte": "2020-05-02T17:22:29.069Z",
"lt": "2020-05-23T17:23:29.069Z"
}
}
}, {
"terms": {
"can.level": ["WARN", "ERROR"]
}
}, {
"terms": {
"can.class": ["MTMessage", "ParserService", "JsonParser"]
}
}]
}
}
我想,上述搜索查询不起作用,因为您的字段 can.deployement
、can.level
和 can.class
是一个 text
字段。如果这些是文本字段,Elasticsearch 会默认使用标准分析器来分析这些类型的字段,它会按停用词划分文本并将所有文本转换为小写。您可以从 here.
参考更多相关信息
对于您的情况,例如 can.deployement
字段值 can-prod
将被分析为
{
"tokens": [
{
"token": "can",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "prod",
"start_offset": 4,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 1
}
]
}
Terms
查询匹配精确的单词(区分大小写的搜索),但由于 elasticsearch 分析您的文本并划分并转换为小写,您无法找到精确的搜索文本。
为了解决这个问题,在为这 3 个字段(can.deployement
、can.level
和 can.class
)创建索引映射时,您可以创建一个 keyword
字段类型,它基本上告诉 Elasticsearch 不要分析该字段并将其按原样存储。
您可以为这 3 个字段创建映射,例如:
映射:
"mappings": {
"properties": {
"can.class": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"can.deployment": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"can.level": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
现在您可以使用这些关键字字段执行 terms
搜索:
搜索查询:
{ "query": {
"bool": {
"minimum_should_match": 1,
"should": {
"terms": {
"can.deployment.keyword": ["can-prod", "can-test", "can-dev"]
}
},
"filter": [ {
"terms": {
"can.level.keyword": ["WARN", "ERROR"]
}
}, {
"terms": {
"can.class.keyword": ["MTMessage", "ParserService", "JsonParser"]
}
}]
}
}
}
这个词条查询只适用于区分大小写的搜索。您可以从 here 参考更多相关信息。
如果您想进行不区分大小写的搜索,您可以使用 match
查询来执行相同的操作:
搜索查询:
{
"query": {
"bool": {
"must": [
{
"match": {
"level": "warn error"
}
},
{
"match": {
"class": "MTMessage ParserService JsonParser"
}
},
{
"match": {
"deployment": "can-test can-prod can-dev"
}
}
]
}
}
}
这是有效的,因为默认情况下 Elasticsearch 使用与索引分析器相同的分析器分析您的 match
查询文本。由于在您的情况下它是标准分析器,它会将此 match
查询文本转换为小写并删除停用词。您可以从 here 阅读更多相关信息。
例如,对于搜索值 MTMessage ParserService JsonParser
,它将在内部被分析为:
{
"tokens": [
{
"token": "mtmessage",
"start_offset": 0,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "parserservice",
"start_offset": 10,
"end_offset": 23,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "jsonparser",
"start_offset": 24,
"end_offset": 34,
"type": "<ALPHANUM>",
"position": 2
}
]
}
并且由于您的文档中包含此字段的值也以这种方式进行了分析,因此它们将匹配。
这里是这个值 can-test can-prod can-dev
的一个问题,它将被分析为:
{
"tokens": [
{
"token": "can",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "test",
"start_offset": 4,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "can",
"start_offset": 9,
"end_offset": 12,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "prod",
"start_offset": 13,
"end_offset": 17,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "can",
"start_offset": 18,
"end_offset": 21,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "dev",
"start_offset": 22,
"end_offset": 25,
"type": "<ALPHANUM>",
"position": 5
}
]
}
现在,如果您的索引中存在此类文档:
{
"can.deployment": "can",
"can.level": "WARN",
"can.class": "JsonParser"
}
那么该文档也会显示在您的搜索结果中。
因此,根据您要执行的搜索类型和您拥有的搜索数据类型,您可以决定是使用 terms
查询还是 match
查询。
我正在尝试使用以下查询在 3 个环境(开发、测试、生产)上查询所有可能的日志 terms
:已尝试 must
和 should
。
curl -vs -o -X POST http://localhost:9200/*/_search?pretty=true -d '
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": {
"terms": {
"can.deployment": ["can-prod", "can-test", "can-dev"]
}
"filter": [{
"range": {
"@timestamp": {
"gte": "2020-05-02T17:22:29.069Z",
"lt": "2020-05-23T17:23:29.069Z"
}
}
}, {
"terms": {
"can.level": ["WARN", "ERROR"]
}
}, {
"terms": {
"can.class": ["MTMessage", "ParserService", "JsonParser"]
}
}]
}
}
}'
给出:
{
"took" : 871,
"timed_out" : false,
"_shards" : {
"total" : 391,
"successful" : 389,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
但是,如果我将 terms
替换为 match
,它会起作用。但无法查询其他输入,例如查询 WARN 消息、与 ParserService 相关的查询日志 class 等:
curl -vs -o -X POST http://localhost:9200/*/_search?pretty=true -d '
{
"query": {
"bool": {
"should":
[{"match": {"can.deployment": "can-prod"}}],
"filter": [{
"range": {
"@timestamp": {
"gte": "2020-03-20T17:22:29.069Z",
"lt": "2020-05-01T17:23:29.069Z"
}
}
},{
"match": {
"can.level": "ERROR"
}
},{
"match": {
"can.class": "MTMessage"
}
}
]
}
}
}'
使用或不使用 terms/match 我如何完成此操作? 试过这个,没有运气。我得到 0 个搜索结果:
"match": {
"can.level": "ERROR"
}
},{
"match": {
"can.level": "WARN"
}
},{
"match": {
"can.class": "MTMessage"
}
}
任何提示肯定会有帮助。 TIA!
[编辑] 添加映射 (/_mapping?pretty=true):
"can" : {
"properties" : {
"class" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"deployment" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"level" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
添加示例文档:
{
"took" : 50,
"timed_out" : false,
"_shards" : {
"total" : 391,
"successful" : 387,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 5.44714,
"hits" : [
{
"_index" : "filebeat-6.1.2-2020.05.21",
"_type" : "doc",
"_id" : "AXI9K_cggA4T9jvjZc03",
"_score" : 5.44714,
"_source" : {
"@timestamp" : "2020-05-21T02:59:25.373Z",
"offset" : 34395681,
"beat" : {
"hostname" : "4c80d1588455-661e-7054-a4e5-73c821d7",
"name" : "4c80d1588455-661e-7054-a4e5-73c821d7",
"version" : "6.1.2"
},
"prospector" : {
"type" : "log"
},
"source" : "/var/logs/packages/gateway_mt/1a27957180c2b57a53e76dd686a06f4983bf233f/logs/gateway_mt.log",
"message" : "[2020-05-21 02:59:25.373] ERROR can_gateway_mt [ActiveMT SNAP Worker 18253] --- ClientIdAuthenticationFilter: Cannot authorize publishing from client ThingPayload_4
325334a89c9 : not authorized",
"fileset" : {
"module" : "can",
"name" : "services"
},
"fields" : { },
"can" : {
"component" : "can_gateway_mt",
"instancename" : "canservices/0",
"level" : "ERROR",
"thread" : "ActiveMT SNAP Worker 18253",
"message" : "Cannot authorize publishing from client ThingPayload_4325334a89c9 : not authorized",
"class" : "ClientIdAuthenticationFilter",
"timestamp" : "2020-05-21 02:59:25.373",
"deployment" : "can-prod"
}
}
}
]
}
}
预期输出: 试图获取符合条件的整个文档的转储。类似于上面的示例文档。
"query": {
"bool": {
"minimum_should_match": 1,
"should": {
"terms": {
"can.deployment": ["can-prod", "can-test", "can-dev"]
}
"filter": [{
"range": {
"@timestamp": {
"gte": "2020-05-02T17:22:29.069Z",
"lt": "2020-05-23T17:23:29.069Z"
}
}
}, {
"terms": {
"can.level": ["WARN", "ERROR"]
}
}, {
"terms": {
"can.class": ["MTMessage", "ParserService", "JsonParser"]
}
}]
}
}
我想,上述搜索查询不起作用,因为您的字段 can.deployement
、can.level
和 can.class
是一个 text
字段。如果这些是文本字段,Elasticsearch 会默认使用标准分析器来分析这些类型的字段,它会按停用词划分文本并将所有文本转换为小写。您可以从 here.
对于您的情况,例如 can.deployement
字段值 can-prod
将被分析为
{
"tokens": [
{
"token": "can",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "prod",
"start_offset": 4,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 1
}
]
}
Terms
查询匹配精确的单词(区分大小写的搜索),但由于 elasticsearch 分析您的文本并划分并转换为小写,您无法找到精确的搜索文本。
为了解决这个问题,在为这 3 个字段(can.deployement
、can.level
和 can.class
)创建索引映射时,您可以创建一个 keyword
字段类型,它基本上告诉 Elasticsearch 不要分析该字段并将其按原样存储。
您可以为这 3 个字段创建映射,例如:
映射:
"mappings": {
"properties": {
"can.class": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"can.deployment": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"can.level": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
现在您可以使用这些关键字字段执行 terms
搜索:
搜索查询:
{ "query": {
"bool": {
"minimum_should_match": 1,
"should": {
"terms": {
"can.deployment.keyword": ["can-prod", "can-test", "can-dev"]
}
},
"filter": [ {
"terms": {
"can.level.keyword": ["WARN", "ERROR"]
}
}, {
"terms": {
"can.class.keyword": ["MTMessage", "ParserService", "JsonParser"]
}
}]
}
}
}
这个词条查询只适用于区分大小写的搜索。您可以从 here 参考更多相关信息。
如果您想进行不区分大小写的搜索,您可以使用 match
查询来执行相同的操作:
搜索查询:
{
"query": {
"bool": {
"must": [
{
"match": {
"level": "warn error"
}
},
{
"match": {
"class": "MTMessage ParserService JsonParser"
}
},
{
"match": {
"deployment": "can-test can-prod can-dev"
}
}
]
}
}
}
这是有效的,因为默认情况下 Elasticsearch 使用与索引分析器相同的分析器分析您的 match
查询文本。由于在您的情况下它是标准分析器,它会将此 match
查询文本转换为小写并删除停用词。您可以从 here 阅读更多相关信息。
例如,对于搜索值 MTMessage ParserService JsonParser
,它将在内部被分析为:
{
"tokens": [
{
"token": "mtmessage",
"start_offset": 0,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "parserservice",
"start_offset": 10,
"end_offset": 23,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "jsonparser",
"start_offset": 24,
"end_offset": 34,
"type": "<ALPHANUM>",
"position": 2
}
]
}
并且由于您的文档中包含此字段的值也以这种方式进行了分析,因此它们将匹配。
这里是这个值 can-test can-prod can-dev
的一个问题,它将被分析为:
{
"tokens": [
{
"token": "can",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "test",
"start_offset": 4,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "can",
"start_offset": 9,
"end_offset": 12,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "prod",
"start_offset": 13,
"end_offset": 17,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "can",
"start_offset": 18,
"end_offset": 21,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "dev",
"start_offset": 22,
"end_offset": 25,
"type": "<ALPHANUM>",
"position": 5
}
]
}
现在,如果您的索引中存在此类文档:
{
"can.deployment": "can",
"can.level": "WARN",
"can.class": "JsonParser"
}
那么该文档也会显示在您的搜索结果中。
因此,根据您要执行的搜索类型和您拥有的搜索数据类型,您可以决定是使用 terms
查询还是 match
查询。