只匹配 IP 地址,不匹配其他号码
Only match IP addresses and not other numbers
我想要以下正则表达式代码 return IP 地址的输出,而不 return 将源文件中的其他数值作为 IP。
代码:
import re
logdata = 146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622
for item in re.finditer("(?P<host>[\d.]+)", logdata):
print(item.groupdict())
要求输出:
{'host': '146.204.224.152'}
你想要的输出:
{'host': '6811'}
我认为应该这样做:
(?P<host>(\d+\.){3}\d+)
使用
import re
logdata = r'146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622'
for item in re.finditer(r"\b(?P<host>(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3})\b", logdata):
print(item.groupdict())
结果:{'host': '146.204.224.152'}
.
见Extract ip addresses from Strings using regex。
从日志行中获取 host
和 time
,就像您拥有的那样:
import re
logdata = r'146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622'
match_data = re.search(r'^(?P<host>\S+).*?\[(?P<time>.*?)]', logdata)
if match_data:
print(match_data.groupdict())
解释
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?P<host> group and capture to (?P=host):
--------------------------------------------------------------------------------
\S+ non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of (?P=host)
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
(?P<time> group and capture to (?P=time):
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of (?P=time)
--------------------------------------------------------------------------------
] ']'
我想要以下正则表达式代码 return IP 地址的输出,而不 return 将源文件中的其他数值作为 IP。
代码:
import re
logdata = 146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622
for item in re.finditer("(?P<host>[\d.]+)", logdata):
print(item.groupdict())
要求输出:
{'host': '146.204.224.152'}
你想要的输出:
{'host': '6811'}
我认为应该这样做:
(?P<host>(\d+\.){3}\d+)
使用
import re
logdata = r'146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622'
for item in re.finditer(r"\b(?P<host>(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3})\b", logdata):
print(item.groupdict())
结果:{'host': '146.204.224.152'}
.
见Extract ip addresses from Strings using regex。
从日志行中获取 host
和 time
,就像您拥有的那样:
import re
logdata = r'146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622'
match_data = re.search(r'^(?P<host>\S+).*?\[(?P<time>.*?)]', logdata)
if match_data:
print(match_data.groupdict())
解释
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?P<host> group and capture to (?P=host):
--------------------------------------------------------------------------------
\S+ non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of (?P=host)
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
(?P<time> group and capture to (?P=time):
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of (?P=time)
--------------------------------------------------------------------------------
] ']'