Regex Group,捕获 IP 的问题
Regex Group, problems with catching IPs
我post稍微改变了日志。
我有一个正则表达式来匹配一个日志行中的 3 个不同的组,我匹配时间、IP 和 SMTP 服务器收到的消息。
我用下面的正则表达式试过了
(\d{2}.\d{2}.\d{4} \d{2}:\d{2}:\d{2}).*(\d{1,3}.\d{ 1,3}.\d{1,3}.\d{1,3})..断开.?\s+(\d+) 消息[s]
问题仅在于 2. 将 IP 分组以显示问题
在第一行中,ip 是 11.132.8.61 what regexr cathces is only 1.132.8.6
所以他遗漏了一些数字。我认为 \d{1,3} 他会匹配所有三个或两个数字,如果有一个以上,他也在第二个括号中,但不在第一个或最后一个中。
[16A4:000C-0780] 01.12.2020 01:00:07 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000E-07F8] 01.12.2020 01:00:07 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000E-0780] 01.12.2020 01:00:07 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-0780] 01.12.2020 01:00:07 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-07F8] 01.12.2020 01:00:08 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-0780] 01.12.2020 01:04:51 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-07F8] 01.12.2020 01:30:46 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-0780] 01.12.2020 01:30:46 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000E-0780] 01.12.2020 01:33:25 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-07F8] 01.12.2020 01:33:25 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[12CC:0015-118C] 30.11.2020 05:08:59 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000B-118C] 30.11.2020 05:08:59 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000F-0FF0] 30.11.2020 05:08:59 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000F-120C] 30.11.2020 05:10:05 SMTP Server: bsicip03.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:0015-118C] 30.11.2020 05:10:05 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:0014-118C] 30.11.2020 05:10:05 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000B-120C] 30.11.2020 05:10:05 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000A-120C] 30.11.2020 05:10:05 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
The expected out-put would be
match[1] = 01.12.2020 01:00:07
match[2] = 11.132.8.61
match[3] = 1
使用您显示的示例,请尝试使用正则表达式。这将创建 3 个捕获组,其中包含稍后要捕获的值。
^\[\S+\s+(\d{1,2}\.\d{2}\.\d{4})\s+(?:\d{2}:){2}\d{2}\s+SMTP\s+Server:\s*(?:\S+\s*\()?((?:\d+\.){3}\d+)\)?\s+[\w.-]+\s+(\d+).*$
解释: 为以上添加详细解释。
^\[\S+\s+ ##Matching from starting [ 1 or more non-space occurrence(s) followed by 1 or more occurrences of spaces.
(\d{1,2}\.\d{2}\.\d{4}) ##Creating 1st capturing group which matches 1 to 2 digits followed by DOT followed by 2 digits followed by dot followed by 4 digits.
\s+(?:\d{2}:){2}\d{2} ##Matching 1 or more occurrences of spaces, then matching (2 digits colon)'s 2 occurrences followed by 2 digits here.
\s+SMTP\s+Server:\s* ##Matching 1 or more spaces followed by SMTP 1 or more spaces followed by Server: spaces.
(?:\S+\s*\()? ##In a non-capturing group matching 1 or more non-spaces followed by 0 or more spaces match ( keeping it optional.
((?:\d+\.){3}\d+)\)? ##Creating 2nd capturing group which has digits in it.
\s+[\w.-]+\s+ ##Matching spaces \w.- 1 or more occurrences followed by spaces.
(\d+) ##Creating 3rd capturing group which has digits in it.
.*$ ##Matching anything till end of value.
将 .*
更改为 .*?
(或者,假设您可以预期在捕获组之间出现至少一个字符,.+?
) 使子表达式 非贪婪 .
这样,.*
不会从以下 \d{1,3}
子表达式匹配的内容中“窃取”最多两个前导数字。
举个简单的例子:
# !! BROKEN: greedy.
PS> if (' 123' -match '.*(\d{1,3})') { $Matches[1] }
3 # !! Only the LAST digit matched, because .* matched as much as it
# !! could while still matching \d{1,3}
# OK: non-greedy.
PS> if (' 123' -match '.*?(\d{1,3})') { $Matches[1] }
123 # OK - all 3 digits matched, because .*? matched as little as it
# could while still matching \d{1,3}
把它们放在一起(注意我用的是.+?
,也代替了disconnected
之前的..
):
'[16A4:000C-0780] 01.12.2020 01:00:07 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received',
'[12CC:0015-118C] 30.11.2020 05:08:59 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received' |
ForEach-Object {
if ($_ -match '(\d{2}\.\d{2}\.\d{4} \d{2}:\d{2}:\d{2}).+?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}).+?disconnected\.?\s+(\d+) message\[s\]') {
[pscustomobject] @{
Count = $Matches[3]
Timestamp = $Matches[1]
IP = $Matches[2]
}
}
}
以上结果:
Count Timestamp IP
----- --------- --
1 01.12.2020 01:00:07 11.132.8.61
1 30.11.2020 05:08:59 12.99.81.53
注:
- 通常(在您的情况下可能没有必要),您可以通过使用字边界断言
\b
围绕 .\d{1,3}
等子表达式使正则表达式更健壮,以便它们不要意外地匹配 更长的 串数字,或者您可以明确规定 non-digit (\D
) 之前和关注
使用 -split
运算符的替代解决方案:
作为 Lee Daley points out, you could use -split
, the string splitting operator 将您的行拆分为字段,作为正则表达式的概念上更简单的替代方法:
'[16A4:000C-0780] 01.12.2020 01:00:07 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received',
'[12CC:0015-118C] 30.11.2020 05:08:59 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received' |
ForEach-Object {
$fields = -split $_
if ($fields[-4] -eq 'disconnected.') {
[pscustomobject] @{
Count = $fields[-3]
Timestamp = '{0} {1}' -f $fields[1], $fields[2]
IP = $fields[-5].Trim('()')
}
}
}
上面的结果与基于正则表达式的解决方案相同。
我post稍微改变了日志。
我有一个正则表达式来匹配一个日志行中的 3 个不同的组,我匹配时间、IP 和 SMTP 服务器收到的消息。
我用下面的正则表达式试过了 (\d{2}.\d{2}.\d{4} \d{2}:\d{2}:\d{2}).*(\d{1,3}.\d{ 1,3}.\d{1,3}.\d{1,3})..断开.?\s+(\d+) 消息[s]
问题仅在于 2. 将 IP 分组以显示问题 在第一行中,ip 是 11.132.8.61 what regexr cathces is only 1.132.8.6 所以他遗漏了一些数字。我认为 \d{1,3} 他会匹配所有三个或两个数字,如果有一个以上,他也在第二个括号中,但不在第一个或最后一个中。
[16A4:000C-0780] 01.12.2020 01:00:07 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000E-07F8] 01.12.2020 01:00:07 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000E-0780] 01.12.2020 01:00:07 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-0780] 01.12.2020 01:00:07 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-07F8] 01.12.2020 01:00:08 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-0780] 01.12.2020 01:04:51 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-07F8] 01.12.2020 01:30:46 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-0780] 01.12.2020 01:30:46 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000E-0780] 01.12.2020 01:33:25 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-07F8] 01.12.2020 01:33:25 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[12CC:0015-118C] 30.11.2020 05:08:59 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000B-118C] 30.11.2020 05:08:59 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000F-0FF0] 30.11.2020 05:08:59 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000F-120C] 30.11.2020 05:10:05 SMTP Server: bsicip03.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:0015-118C] 30.11.2020 05:10:05 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:0014-118C] 30.11.2020 05:10:05 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000B-120C] 30.11.2020 05:10:05 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000A-120C] 30.11.2020 05:10:05 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
The expected out-put would be
match[1] = 01.12.2020 01:00:07
match[2] = 11.132.8.61
match[3] = 1
使用您显示的示例,请尝试使用正则表达式。这将创建 3 个捕获组,其中包含稍后要捕获的值。
^\[\S+\s+(\d{1,2}\.\d{2}\.\d{4})\s+(?:\d{2}:){2}\d{2}\s+SMTP\s+Server:\s*(?:\S+\s*\()?((?:\d+\.){3}\d+)\)?\s+[\w.-]+\s+(\d+).*$
解释: 为以上添加详细解释。
^\[\S+\s+ ##Matching from starting [ 1 or more non-space occurrence(s) followed by 1 or more occurrences of spaces.
(\d{1,2}\.\d{2}\.\d{4}) ##Creating 1st capturing group which matches 1 to 2 digits followed by DOT followed by 2 digits followed by dot followed by 4 digits.
\s+(?:\d{2}:){2}\d{2} ##Matching 1 or more occurrences of spaces, then matching (2 digits colon)'s 2 occurrences followed by 2 digits here.
\s+SMTP\s+Server:\s* ##Matching 1 or more spaces followed by SMTP 1 or more spaces followed by Server: spaces.
(?:\S+\s*\()? ##In a non-capturing group matching 1 or more non-spaces followed by 0 or more spaces match ( keeping it optional.
((?:\d+\.){3}\d+)\)? ##Creating 2nd capturing group which has digits in it.
\s+[\w.-]+\s+ ##Matching spaces \w.- 1 or more occurrences followed by spaces.
(\d+) ##Creating 3rd capturing group which has digits in it.
.*$ ##Matching anything till end of value.
将 .*
更改为 .*?
(或者,假设您可以预期在捕获组之间出现至少一个字符,.+?
) 使子表达式 非贪婪 .
这样,.*
不会从以下 \d{1,3}
子表达式匹配的内容中“窃取”最多两个前导数字。
举个简单的例子:
# !! BROKEN: greedy.
PS> if (' 123' -match '.*(\d{1,3})') { $Matches[1] }
3 # !! Only the LAST digit matched, because .* matched as much as it
# !! could while still matching \d{1,3}
# OK: non-greedy.
PS> if (' 123' -match '.*?(\d{1,3})') { $Matches[1] }
123 # OK - all 3 digits matched, because .*? matched as little as it
# could while still matching \d{1,3}
把它们放在一起(注意我用的是.+?
,也代替了disconnected
之前的..
):
'[16A4:000C-0780] 01.12.2020 01:00:07 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received',
'[12CC:0015-118C] 30.11.2020 05:08:59 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received' |
ForEach-Object {
if ($_ -match '(\d{2}\.\d{2}\.\d{4} \d{2}:\d{2}:\d{2}).+?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}).+?disconnected\.?\s+(\d+) message\[s\]') {
[pscustomobject] @{
Count = $Matches[3]
Timestamp = $Matches[1]
IP = $Matches[2]
}
}
}
以上结果:
Count Timestamp IP
----- --------- --
1 01.12.2020 01:00:07 11.132.8.61
1 30.11.2020 05:08:59 12.99.81.53
注:
- 通常(在您的情况下可能没有必要),您可以通过使用字边界断言
\b
围绕.\d{1,3}
等子表达式使正则表达式更健壮,以便它们不要意外地匹配 更长的 串数字,或者您可以明确规定 non-digit (\D
) 之前和关注
使用 -split
运算符的替代解决方案:
作为 Lee Daley points out, you could use -split
, the string splitting operator 将您的行拆分为字段,作为正则表达式的概念上更简单的替代方法:
'[16A4:000C-0780] 01.12.2020 01:00:07 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received',
'[12CC:0015-118C] 30.11.2020 05:08:59 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received' |
ForEach-Object {
$fields = -split $_
if ($fields[-4] -eq 'disconnected.') {
[pscustomobject] @{
Count = $fields[-3]
Timestamp = '{0} {1}' -f $fields[1], $fields[2]
IP = $fields[-5].Trim('()')
}
}
}
上面的结果与基于正则表达式的解决方案相同。