重新忽略一些行
Re - ignoring some lines
我试图将我的数据转换成字典列表,例如
example_dict = {"host":"146.204.224.152",
"user_name":"feest6811", #note: sometimes the user name is missing! In this case, use '-' as the value for the username.**)
"time":"21/Jun/2019:15:45:24 -0700",
"request":"POST /incentivize HTTP/1.1"} #note: not everything is a POST
我的数据:
86.187.99.249 - tillman6650 [21/Jun/2019:15:46:03 -0700] "POST /efficient/unleash HTTP/1.1" 405 22390
76.72.133.93 - carroll1056 [21/Jun/2019:15:46:05 -0700] "POST /morph/optimize/plug-and-play HTTP/2.0" 400 27172
73.162.151.229 - dubuque3528 [21/Jun/2019:15:46:08 -0700] "DELETE /transition/holistic/e-business HTTP/2.0" 301 13923
13.112.8.80 - rau5026 [21/Jun/2019:15:46:09 -0700] "HEAD /ubiquitous/transparent HTTP/1.1" 200 16928
159.253.153.40 - - [21/Jun/2019:15:46:10 -0700] "POST /e-business HTTP/1.0" 504 19845
136.195.158.6 - feeney9464 [21/Jun/2019:15:46:11 -0700] "HEAD /open-source/markets HTTP/2.0" 204 21149
219.194.113.255 - - [21/Jun/2019:15:46:12 -0700] "PATCH /next-generation/niches/mindshare HTTP/1.0" 503 20246
59.101.239.174 - brekke3293 [21/Jun/2019:15:46:13 -0700] "DELETE /ubiquitous/seize/web-enabled HTTP/2.0" 302 14017
我的代码:
pattern = """
(?P<host>.*) #User host
(-\ ) #Separator
(?P<user_name>\w*) #User name
(\ \[) #Separator for pharanteses and space
(?P<time>\S*\ -0700) #time
(\]\ ) #Separator for pharanteses and space
(?P<request>.*")
"""
for user in re.finditer(pattern,logdata,re.VERBOSE):
print(user.groupdict())
输出:
{'host': '86.187.99.249 ', 'user_name': 'tillman6650', 'time': '21/Jun/2019:15:46:03 -0700', 'request': '"POST /efficient/unleash HTTP/1.1"'}
{'host': '76.72.133.93 ', 'user_name': 'carroll1056', 'time': '21/Jun/2019:15:46:05 -0700', 'request': '"POST /morph/optimize/plug-and-play HTTP/2.0"'}
{'host': '73.162.151.229 ', 'user_name': 'dubuque3528', 'time': '21/Jun/2019:15:46:08 -0700', 'request': '"DELETE /transition/holistic/e-business HTTP/2.0"'}
{'host': '13.112.8.80 ', 'user_name': 'rau5026', 'time': '21/Jun/2019:15:46:09 -0700', 'request': '"HEAD /ubiquitous/transparent HTTP/1.1"'}
{'host': '136.195.158.6 ', 'user_name': 'feeney9464', 'time': '21/Jun/2019:15:46:11 -0700', 'request': '"HEAD /open-source/markets HTTP/2.0"'}
{'host': '59.101.239.174 ', 'user_name': 'brekke3293', 'time': '21/Jun/2019:15:46:13 -0700', 'request': '"DELETE /ubiquitous/seize/web-enabled HTTP/2.0"'}
在给定的数据中,一些用户名是“-”,在我的代码中,它只是跳过这些行。我也必须添加这些行并使用“-”作为用户名的值。
您可以将当前的 username
正则表达式更改为
(?P<user_name>[\w\-]*)
由于 -
符号在正则表达式中具有特殊含义(它表示匹配从 0 到 9 的任何数字的范围)以按字面匹配它,您需要使用 \
[=14 转义它=]
我试图将我的数据转换成字典列表,例如
example_dict = {"host":"146.204.224.152",
"user_name":"feest6811", #note: sometimes the user name is missing! In this case, use '-' as the value for the username.**)
"time":"21/Jun/2019:15:45:24 -0700",
"request":"POST /incentivize HTTP/1.1"} #note: not everything is a POST
我的数据:
86.187.99.249 - tillman6650 [21/Jun/2019:15:46:03 -0700] "POST /efficient/unleash HTTP/1.1" 405 22390
76.72.133.93 - carroll1056 [21/Jun/2019:15:46:05 -0700] "POST /morph/optimize/plug-and-play HTTP/2.0" 400 27172
73.162.151.229 - dubuque3528 [21/Jun/2019:15:46:08 -0700] "DELETE /transition/holistic/e-business HTTP/2.0" 301 13923
13.112.8.80 - rau5026 [21/Jun/2019:15:46:09 -0700] "HEAD /ubiquitous/transparent HTTP/1.1" 200 16928
159.253.153.40 - - [21/Jun/2019:15:46:10 -0700] "POST /e-business HTTP/1.0" 504 19845
136.195.158.6 - feeney9464 [21/Jun/2019:15:46:11 -0700] "HEAD /open-source/markets HTTP/2.0" 204 21149
219.194.113.255 - - [21/Jun/2019:15:46:12 -0700] "PATCH /next-generation/niches/mindshare HTTP/1.0" 503 20246
59.101.239.174 - brekke3293 [21/Jun/2019:15:46:13 -0700] "DELETE /ubiquitous/seize/web-enabled HTTP/2.0" 302 14017
我的代码:
pattern = """
(?P<host>.*) #User host
(-\ ) #Separator
(?P<user_name>\w*) #User name
(\ \[) #Separator for pharanteses and space
(?P<time>\S*\ -0700) #time
(\]\ ) #Separator for pharanteses and space
(?P<request>.*")
"""
for user in re.finditer(pattern,logdata,re.VERBOSE):
print(user.groupdict())
输出:
{'host': '86.187.99.249 ', 'user_name': 'tillman6650', 'time': '21/Jun/2019:15:46:03 -0700', 'request': '"POST /efficient/unleash HTTP/1.1"'}
{'host': '76.72.133.93 ', 'user_name': 'carroll1056', 'time': '21/Jun/2019:15:46:05 -0700', 'request': '"POST /morph/optimize/plug-and-play HTTP/2.0"'}
{'host': '73.162.151.229 ', 'user_name': 'dubuque3528', 'time': '21/Jun/2019:15:46:08 -0700', 'request': '"DELETE /transition/holistic/e-business HTTP/2.0"'}
{'host': '13.112.8.80 ', 'user_name': 'rau5026', 'time': '21/Jun/2019:15:46:09 -0700', 'request': '"HEAD /ubiquitous/transparent HTTP/1.1"'}
{'host': '136.195.158.6 ', 'user_name': 'feeney9464', 'time': '21/Jun/2019:15:46:11 -0700', 'request': '"HEAD /open-source/markets HTTP/2.0"'}
{'host': '59.101.239.174 ', 'user_name': 'brekke3293', 'time': '21/Jun/2019:15:46:13 -0700', 'request': '"DELETE /ubiquitous/seize/web-enabled HTTP/2.0"'}
在给定的数据中,一些用户名是“-”,在我的代码中,它只是跳过这些行。我也必须添加这些行并使用“-”作为用户名的值。
您可以将当前的 username
正则表达式更改为
(?P<user_name>[\w\-]*)
由于 -
符号在正则表达式中具有特殊含义(它表示匹配从 0 到 9 的任何数字的范围)以按字面匹配它,您需要使用 \
[=14 转义它=]