匹配 ELK 中特定位置的子字符串电子邮件地址

Question

我正在尝试从 ELK Kibana 发现部分的消息字段中找出与电子邮件匹配的数据，我正在使用以下方法获取结果：

@message:"abc@email.com"

但是，生成的结果包含一些不应匹配电子邮件的其他消息，我无法为此构建解决方案。

结果是（出于安全原因数据已被清理）：

@message:[INF] [2020-07-07 12:54:51.105] [PID-1] : [abcdefg] [JID-5c] [data] LIST_LOOKUP: abc@email.com | User List from Profiles | name | user_name @id:355502086986714

@message:[INF] [2020-07-07 12:38:36.755] [PID-2] : [abcdefg] [JID-ed2] [data] LIST_LOOKUP: abc@email.com | User List from Profiles | name | user_name @id:355501869671304

@message:[INF] [2020-07-07 12:19:48.141] [PID-3] [abc@email.com] : [c5] [data] Completed 200 OK in 11ms @id:355501617979964834

@message:[INF] [2020-07-07 11:19:48.930] [PID-5] [abc@email.com] : [542] [data] Completed 200 OK in 9ms @id:35550081535

虽然我希望它是：

@message:[INF] [2020-07-07 12:19:48.141] [PID-3] [abc@email.com] : [c5] [data] Completed 200 OK in 11ms @id:355501617979964834

@message:[INF] [2020-07-07 11:19:48.930] [PID-5] [abc@email.com] : [542] [data] Completed 200 OK in 9ms @id:35550081535

我已经尝试使用 @message: "[PID-*] [abc@email.com]"、@message: "\[PID-*\] \[abc@email.com\] \:"、@message: "[abc@email.com]"、@message: *abc@email.com* 和其他一些类似的搜索，但都没有成功。

请让我知道我在这里遗漏了什么以及如何使用 discover 和 KQL/Lucene.

在 ELK kibana 中进行高效的潜台词搜索

这是我的索引的映射（我正在从 cloudwatch 日志中获取数据）：

{
   "cwl-*":{
      "mappings":{
         "properties":{
            "@id":{
               "type":"string"
            },
            "@log_stream":{
               "type":"string"
            },
            "@log_group":{
               "type":"string"
            },
            "@message":{
               "type":"string"
            },
            "@owner":{
               "type":"string"
            },
            "@timestamp":{
               "type":"date"
            }
         }
      }
   }
}

Answer 1

您的所有结果都包含 abc@gmail.com。所以这是意料之中的。

[abc@gmail.com] 被标记为

{
    "tokens": [
        {
            "token": "abc",
            "start_offset": 1,
            "end_offset": 4,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "gmail.com",
            "start_offset": 5,
            "end_offset": 14,
            "type": "<ALPHANUM>",
            "position": 1
        }
    ]
}

如果您有电子邮件字段，则可以使用它。或者您需要更改该字段的映射。

如果它没有回答您的问题，您可以使用 http://host:port/indexName/_mapping

为该字段添加映射吗

Answer 2

正如@Gibbs 已经提到的原因 all your data contains 字符串 abc@email.com 并且通过现在查看您的映射确认您正在使用 string 字段而没有显式分析器将使用 default standard analyzer

取而代之的是，您应该将获取邮件 ID 的字段映射到使用不拆分文本的 UAX URL Email tokenizer 的自定义分析器。

有关如何使用示例创建此分析器的示例

与自定义电子邮件分析器的映射

{
    "settings": {
        "analysis": {
            "analyzer": {
                "email_analyzer": {
                    "tokenizer": "my_tokenizer"
                }
            },
            "tokenizer": {
                "my_tokenizer": {
                    "type": "uax_url_email"
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "email": {
                "type": "text",
                "analyzer": "email_analyzer"
            }
        }
    }
}

分析api响应

POST http://{{hostname}}:{{port}}/{{index-name}}/_analyze

{
    "analyzer": "email_analyzer",
    "text": "abc@email.com"
}


{
    "tokens": [
        {
            "token": "abc@email.com",
            "start_offset": 0,
            "end_offset": 13,
            "type": "<EMAIL>",
            "position": 0
        }
    ]
}

匹配 ELK 中特定位置的子字符串电子邮件地址

Match Substring email address at specific location in ELK

lucene

elasticsearch

kibana

amazon-elasticsearch