Regex Group，捕获 IP 的问题

Question

我post稍微改变了日志。

我有一个正则表达式来匹配一个日志行中的 3 个不同的组，我匹配时间、IP 和 SMTP 服务器收到的消息。

我用下面的正则表达式试过了 (\d{2}.\d{2}.\d{4} \d{2}:\d{2}:\d{2}).*(\d{1,3}.\d{ 1,3}.\d{1,3}.\d{1,3})..断开.?\s+(\d+) 消息[s]

问题仅在于 2. 将 IP 分组以显示问题在第一行中，ip 是 11.132.8.61 what regexr cathces is only 1.132.8.6 所以他遗漏了一些数字。我认为 \d{1,3} 他会匹配所有三个或两个数字，如果有一个以上，他也在第二个括号中，但不在第一个或最后一个中。

[16A4:000C-0780] 01.12.2020 01:00:07   SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000E-07F8] 01.12.2020 01:00:07   SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000E-0780] 01.12.2020 01:00:07   SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-0780] 01.12.2020 01:00:07   SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-07F8] 01.12.2020 01:00:08   SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-0780] 01.12.2020 01:04:51   SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-07F8] 01.12.2020 01:30:46   SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-0780] 01.12.2020 01:30:46   SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000E-0780] 01.12.2020 01:33:25   SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-07F8] 01.12.2020 01:33:25   SMTP Server: 11.132.8.61 disconnected. 1 message[s] received

[12CC:0015-118C] 30.11.2020 05:08:59   SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000B-118C] 30.11.2020 05:08:59   SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000F-0FF0] 30.11.2020 05:08:59   SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000F-120C] 30.11.2020 05:10:05   SMTP Server: bsicip03.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:0015-118C] 30.11.2020 05:10:05   SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:0014-118C] 30.11.2020 05:10:05   SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000B-120C] 30.11.2020 05:10:05   SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000A-120C] 30.11.2020 05:10:05   SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received

The expected out-put would be 
match[1] = 01.12.2020 01:00:07
match[2] = 11.132.8.61
match[3] = 1

Answer 1

使用您显示的示例，请尝试使用正则表达式。这将创建 3 个捕获组，其中包含稍后要捕获的值。

^\[\S+\s+(\d{1,2}\.\d{2}\.\d{4})\s+(?:\d{2}:){2}\d{2}\s+SMTP\s+Server:\s*(?:\S+\s*\()?((?:\d+\.){3}\d+)\)?\s+[\w.-]+\s+(\d+).*$

Online demo for above regex

解释： 为以上添加详细解释。

^\[\S+\s+                ##Matching from starting [ 1 or more non-space occurrence(s) followed by 1 or more occurrences of spaces.
(\d{1,2}\.\d{2}\.\d{4})  ##Creating 1st capturing group which matches 1 to 2 digits followed by DOT followed by 2 digits followed by dot followed by 4 digits.
\s+(?:\d{2}:){2}\d{2}    ##Matching 1 or more occurrences of spaces, then matching (2 digits colon)'s 2 occurrences followed by 2 digits here.
\s+SMTP\s+Server:\s*     ##Matching 1 or more spaces followed by SMTP 1 or more spaces followed by Server: spaces.
(?:\S+\s*\()?            ##In a non-capturing group matching 1 or more non-spaces followed by 0 or more spaces match ( keeping it optional.
((?:\d+\.){3}\d+)\)?     ##Creating 2nd capturing group which has digits in it.
\s+[\w.-]+\s+            ##Matching spaces \w.- 1 or more occurrences followed by spaces.
(\d+)                    ##Creating 3rd capturing group which has digits in it.
.*$                      ##Matching anything till end of value.

Answer 2

将 .* 更改为 .*?（或者，假设您可以预期在捕获组之间出现至少一个字符，.+?) 使子表达式 非贪婪 .

这样，.* 不会从以下 \d{1,3} 子表达式匹配的内容中“窃取”最多两个前导数字。

举个简单的例子：

# !! BROKEN: greedy.
PS> if (' 123' -match '.*(\d{1,3})') { $Matches[1] }

3 # !! Only the LAST digit matched, because .* matched as much as it
  # !! could while still matching \d{1,3}

# OK: non-greedy.
PS> if (' 123' -match '.*?(\d{1,3})') { $Matches[1] }

123 # OK - all 3 digits matched, because .*? matched as little as it
    # could while still matching \d{1,3}

把它们放在一起（注意我用的是.+?，也代替了disconnected之前的..）：

'[16A4:000C-0780] 01.12.2020 01:00:07   SMTP Server: 11.132.8.61 disconnected. 1 message[s] received',
'[12CC:0015-118C] 30.11.2020 05:08:59   SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received' |
  ForEach-Object {
    if ($_ -match '(\d{2}\.\d{2}\.\d{4} \d{2}:\d{2}:\d{2}).+?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}).+?disconnected\.?\s+(\d+) message\[s\]') {
      [pscustomobject] @{
        Count = $Matches[3]
        Timestamp = $Matches[1]
        IP = $Matches[2]
      }
    }
  }

以上结果：

Count Timestamp           IP
----- ---------           --
1     01.12.2020 01:00:07 11.132.8.61
1     30.11.2020 05:08:59 12.99.81.53

注：

通常（在您的情况下可能没有必要），您可以通过使用字边界断言 \b 围绕 .\d{1,3} 等子表达式使正则表达式更健壮，以便它们不要意外地匹配 更长的 串数字，或者您可以明确规定 non-digit (\D) 之前和关注

使用 -split 运算符的替代解决方案:

作为 Lee Daley points out, you could use -split, the string splitting operator 将您的行拆分为字段，作为正则表达式的概念上更简单的替代方法：

'[16A4:000C-0780] 01.12.2020 01:00:07   SMTP Server: 11.132.8.61 disconnected. 1 message[s] received',
'[12CC:0015-118C] 30.11.2020 05:08:59   SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received' |
  ForEach-Object {
    $fields = -split $_
    if ($fields[-4] -eq 'disconnected.') {
      [pscustomobject] @{
        Count     = $fields[-3]
        Timestamp = '{0} {1}' -f $fields[1], $fields[2]
        IP        = $fields[-5].Trim('()')
      }
    }
  }

上面的结果与基于正则表达式的解决方案相同。

Regex Group，捕获 IP 的问题

Regex Group, problems with catching IPs

regex

powershell

regex-group