试图在捕获的价值中捕获价值

Question

我正在尝试从这样的行中解析数据

"Lorem ipsum dolor sit amet, IP: 111.111.111.111, 222.222.222.222, 333.333.333.333\r\n adipiscing elit, sed do eiusmod\r\n tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud"

我正在尝试捕捉这样的值：

留言："Lorem ipsum dolor sit amet, IP: 111.111.111.111, 222.222.222.222, 333.333.333.333\r\n adipiscing elit, sed do eiusmod\r\n tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud"
ip: "111.111.111.111, 222.222.222.222, 333.333.333.333"

可以有任意多个 IP，包括零个。

我正在使用带有单个正则表达式的流利位。这是流利位解析器定义的示例：

[PARSER]
Name syslog-rfc3164
Format regex
Regex /^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$/
Time_Key    time
Time_Format %b %d %H:%M:%S
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep   On

感谢 Cary 和 Aleksei，这里是解决方案：

\A(?<whole>.*?((?<=IP: )(?<ip>(?<four_threes>\d{1,3}(?:\.\d{1,3}){3})(?:, \g<four_threes>)*)).*?)\z

https://rubular.com/r/Kgh5EXMCA0lkew

编辑

我意识到有些字符串中没有 "IP:..." 模式，这会给我一个解析错误。

string1: "Lorem ipsum dolor sit amet, IP: 111.111.111.111, 222.222.222.222, 333.333.333.333\r\n adipiscing elit, sed do eiusmod\r\n tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud"

string2: "Lorem ipsum dolor sit amet, \r\n adipiscing elit, sed do eiusmod\r\n tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud"

我尝试将 *（0 或更多）应用于 ip 组名称匹配，但我无法使其工作。知道我该怎么做吗？

Answer 1

您可以使用 /([0-9]_\.)+/ 作为非常基本的正则表达式（那里有更好的 IPv4 正则表达式）。

然后通过在字符串上使用 .scan(...)，您将获得数组形式的结果。

Answer 2

str = 'Lorem, IP: 111.111.111.111, 222.222.222.222, 333.333.333.333\r\n adipiscing'

r = /
    \A                     # match the beginning of the string
    (?<whole>              # begin named group 'whole' 
      .*?                  # match >= 0 characters 
      (?<ip>               # begin named group 'ip'
        (?<four_threes>    # begin a named group 'four_threes'
          \d{1,3}          # match 1-3 digits
          (?:              # begin a non-capture group
            \.             # match a period
            \d{1,3}        # match 1-3 digits
          ){3}             # close non-capture group and execute same 3 times
        )                  # close capture group 'four_threes'
        (?:                # begin a non-capture group
          ,\p{Space}       # match ', '
          \g<four_threes>  # execute subexpression named 'four_threes'
        )*                 # close non-capture group and execute same >= 0 times
      )                    # close capture group 'ip'
      .*                   # match >= 0 characters
    )                      # close capture group 'whole'
    /x                     # free-spacing regex definition mode

m = str.match(r)
m[:whole] 
  #=> "Lorem, IP: 111.111.111.111, 222.222.222.222, 333.333.333.333\r\n adipiscing" 
m[:ip]
  #=> "111.111.111.111, 222.222.222.222, 333.333.333.333"

正则表达式约定俗成：

/\A(?<whole>.*?(?<ip>(?<four_threes>\d{1,3}(?:\.\d{1,3}){3})(?:, \g<four_threes>)*).*)/

在自由间距模式下定义正则表达式时 spaces 必须以某种方式受到保护，否则它们将在表达式被解析之前被删除。我使用了 \p{Space}，但也可以使用 [[:space:]]、\s 和 [ ]（字符 class 中的 space）。（除了最后一个匹配一个白色 space 字符。）当正则表达式以常规方式编写时，可以使用 space，如上所示。

\g<four_threes> 是一个 subexpression call（搜索 "Subexpression Calls"）。使用它们可以节省打字时间并减少出错的机会。如果不需要这个，第三个命名的捕获，它当然可以被替换掉。

试图在捕获的价值中捕获价值

Trying to capture value within a captured value

ruby

regex

fluent-bit