正则表达式处理 rubular fluentd 中的所有多行异常

Regex to handle all Multiline exception in rubular fluentd

我将正则表达式设计为匹配以下 rubular 格式的 fluentd 解析器的所有多行异常或警告消息字段

(SLF4J:\s.*|[a-zA-z_]*\..*\.*\s.*\s.*|Caused\sby:\s|\s+at\s.*|\s+\.\.\. (\d)+ more)

It matches unnecessary fields.

我想匹配所有异常或警告多行的开始。 简而言之:最新的多行将从文件的开头读取,直到它得到下一行,因为 JSON.JSON 总是以 {" 开头。当我们看到行开始时使用 {" 我们将停止阅读 multiline

one regex for both the cases or 2 regex for both the cases is fine

演示link

正则表达式可用于:https://rubular.com/r/O26Wm6mc7z51re

正则表达式可用于:https://rubular.com/r/v6Q7iwZqmNDAAx

测试字符串是:

java.lang.InterruptedException: Timeout while waiting for epoch from quorum
        at org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:1227)
        at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:482)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1284)
        ... 19 more
{"log_timestamp": "2021-02-18T11:33:23.114+0000", "log_level": "WARN", "process_id": "zookeeper#2", "process_name": "zookeeper", "thread_id": 1, "thread_name": "QuorumPeer[myid=2](plain=/0.0.0.0:2181)(secure=disabled)", "action_name": "org.apache.zookeeper.server.quorum.QuorumPeer", "log_message": "PeerState set to LOOKING"}
{"log_timestamp": "2021-02-18T11:33:23.115+0000", "log_level": "WARN", "process_id": "zookeeper#2", "process_name": "zookeeper", "thread_id": 1, "thread_name": "WorkerSender[myid=2]", "action_name": "org.apache.zookeeper.server.quorum.QuorumPeer", "log_message": "Failed to resolve address: zk-2.zk-headless.intam.svc.cluster.local"}
java.net.UnknownHostException: zk-2.zk-headless.intam.svc.cluster.local
        at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
        at java.net.InetAddress.getAllByName(InetAddress.java:1193)
        at java.net.InetAddress.getAllByName(InetAddress.java:1127)
        at java.net.InetAddress.getByName(InetAddress.java:1077)
        at org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)
        at org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:764)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:699)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:618)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)
        at java.lang.Thread.run(Thread.java:748)
{"log_timestamp": "2021-02-18T11:33:23.115+0000", "log_level": "WARN", "process_id": "zookeeper#2", "process_name": "zookeeper", "thread_id": 1, "thread_name": "WorkerSender[myid=2]", "action_name": "org.apache.zookeeper.server.quorum.QuorumPeer", "log_message": "Failed to resolve address: zk-2.zk-headless.sxc.svc.cluster.local"}

预期匹配: 对于演示 1:https://rubular.com/r/O26Wm6mc7z51re

java.lang.InterruptedException: Timeout while waiting for epoch from quorum
        at org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:1227)
        at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:482)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1284)
        ... 19 more

对于演示2:https://rubular.com/r/v6Q7iwZqmNDAAx

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/spark/jars/logback-classic-1.2.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/spark/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type 

您可能会使用带有捕获组和反向引用的单一模式来获得这两个部分

^(SLF4J:|java\.lang\.InterruptedException:).*(?:\R(?!|{).*)*

模式匹配:

  • ^ 字符串开头
  • (SLF4J:|java\.lang\.InterruptedException).* 在第 1 组中捕获匹配任一备选方案
  • (?:非捕获组
    • \R(?!|{).* 匹配一个换行符并断言该字符串不是以 wat 开头的 group 1 或 {
  • )* 关闭组并可选择重复以匹配所有行

Regex demo

查看 first part and the second part 的规则匹配。

注意在Java中加倍反斜杠

String regex = "^(SLF4J:|java\.lang\.InterruptedException:).*(?:\R(?!\1|\{).*)*";

不跨越 SLF4J 或不同类型的异常,在字符串的开头表示为点分隔字符串:

^(?:SLF4J:|\w+(?:\.\w+)+).*(?:\R(?!(?:SLF4J:|\w+(?:\.\w+)+)|{).*)*

Regex demo