使用 python 将 access.log 转换为 JSON 格式
Convert access.log to JSON format using python
**这是我的 python 代码,我正在尝试转换 NGINX 日志。
我正在从 access.log 文件中读取日志并使用正则表达式将其转换为 JSON 格式,我需要将这些日志上传到 Elasticseach。还请指导相关的。我对两者都是新手**
import json
import re
i = 0
result = {}
with open('access.log') as f:
lines = f.readlines()
regex = '([(\d\.)]+) - - \[(.*?)\] "(.*?)" (\d+) - "(.*?)" "(.*?)"'
for line in lines:
r = re.match(regex,line)
if len(r) >= 6:
result[i] = {'IP address': r[0], 'Time Stamp': r[1], 'HTTP status': r[2], 'Return status':
r[3], 'Browser Info': r[4]}
i += 1
print(result)
with open('data.json', 'w') as fp:
json.dump(result, fp)
我遇到了以下错误
Traceback (most recent call last):
File "/home/zain/Downloads/stack.py", line 17, in <module>
if len(r) >= 6:
TypeError: object of type 'NoneType' has no len()
这些是日志格式
127.0.0.1 - - [23/May/2022:22:44:14 -0400] "GET / HTTP/1.1" 200 3437 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
127.0.0.1 - - [23/May/2022:22:44:14 -0400] "GET /icons/openlogo-75.png HTTP/1.1" 404 125 "http://localhost/" "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
127.0.0.1 - - [23/May/2022:22:44:14 -0400] "GET /favicon.ico HTTP/1.1" 404 125 "http://localhost/" "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
预期输出为
IP Address: 127.0.0.1 Time Stamp: 23/May/2022:22:44:14 HTTP Status: "GET / HTTP/1.1" Return Status: 200 3437 Browser Info: "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
我从 this code 得到了提示。相信以下应该做到这一点:
import json
import re
i = 0
result = {}
with open('access.log') as f:
lines = f.readlines()
regex = '(?P<ipaddress>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - - \[(?P<dateandtime>.*)\] \"(?P<httpstatus>(GET|POST) .+ HTTP\/1\.1)\" (?P<returnstatus>\d{3} \d+) (\".*\")(?P<browserinfo>.*)\"'
for line in lines:
r = re.match(regex,line)
if r != None:
result[i] = {'IP address': r.group('ipaddress'), 'Time Stamp': r.group('dateandtime'),
'HTTP status': r.group('httpstatus'), 'Return status':
r.group('returnstatus'), 'Browser Info': r.group('browserinfo')}
i += 1
print(result)
with open('data.json', 'w') as fp:
json.dump(result, fp)
结果(print(json.dumps(result, sort_keys=False, indent=4))
):
{
"0": {
"IP address": "127.0.0.1",
"Time Stamp": "23/May/2022:22:44:14 -0400",
"HTTP status": "GET / HTTP/1.1",
"Return status": "200 3437",
"Browser Info": "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
},
"1": {
"IP address": "127.0.0.1",
"Time Stamp": "23/May/2022:22:44:14 -0400",
"HTTP status": "GET /icons/openlogo-75.png HTTP/1.1",
"Return status": "404 125",
"Browser Info": "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
},
"2": {
"IP address": "127.0.0.1",
"Time Stamp": "23/May/2022:22:44:14 -0400",
"HTTP status": "GET /favicon.ico HTTP/1.1",
"Return status": "404 125",
"Browser Info": "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
}
}
**这是我的 python 代码,我正在尝试转换 NGINX 日志。
我正在从 access.log 文件中读取日志并使用正则表达式将其转换为 JSON 格式,我需要将这些日志上传到 Elasticseach。还请指导相关的。我对两者都是新手**
import json
import re
i = 0
result = {}
with open('access.log') as f:
lines = f.readlines()
regex = '([(\d\.)]+) - - \[(.*?)\] "(.*?)" (\d+) - "(.*?)" "(.*?)"'
for line in lines:
r = re.match(regex,line)
if len(r) >= 6:
result[i] = {'IP address': r[0], 'Time Stamp': r[1], 'HTTP status': r[2], 'Return status':
r[3], 'Browser Info': r[4]}
i += 1
print(result)
with open('data.json', 'w') as fp:
json.dump(result, fp)
我遇到了以下错误
Traceback (most recent call last):
File "/home/zain/Downloads/stack.py", line 17, in <module>
if len(r) >= 6:
TypeError: object of type 'NoneType' has no len()
这些是日志格式
127.0.0.1 - - [23/May/2022:22:44:14 -0400] "GET / HTTP/1.1" 200 3437 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
127.0.0.1 - - [23/May/2022:22:44:14 -0400] "GET /icons/openlogo-75.png HTTP/1.1" 404 125 "http://localhost/" "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
127.0.0.1 - - [23/May/2022:22:44:14 -0400] "GET /favicon.ico HTTP/1.1" 404 125 "http://localhost/" "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
预期输出为
IP Address: 127.0.0.1 Time Stamp: 23/May/2022:22:44:14 HTTP Status: "GET / HTTP/1.1" Return Status: 200 3437 Browser Info: "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
我从 this code 得到了提示。相信以下应该做到这一点:
import json
import re
i = 0
result = {}
with open('access.log') as f:
lines = f.readlines()
regex = '(?P<ipaddress>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - - \[(?P<dateandtime>.*)\] \"(?P<httpstatus>(GET|POST) .+ HTTP\/1\.1)\" (?P<returnstatus>\d{3} \d+) (\".*\")(?P<browserinfo>.*)\"'
for line in lines:
r = re.match(regex,line)
if r != None:
result[i] = {'IP address': r.group('ipaddress'), 'Time Stamp': r.group('dateandtime'),
'HTTP status': r.group('httpstatus'), 'Return status':
r.group('returnstatus'), 'Browser Info': r.group('browserinfo')}
i += 1
print(result)
with open('data.json', 'w') as fp:
json.dump(result, fp)
结果(print(json.dumps(result, sort_keys=False, indent=4))
):
{
"0": {
"IP address": "127.0.0.1",
"Time Stamp": "23/May/2022:22:44:14 -0400",
"HTTP status": "GET / HTTP/1.1",
"Return status": "200 3437",
"Browser Info": "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
},
"1": {
"IP address": "127.0.0.1",
"Time Stamp": "23/May/2022:22:44:14 -0400",
"HTTP status": "GET /icons/openlogo-75.png HTTP/1.1",
"Return status": "404 125",
"Browser Info": "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
},
"2": {
"IP address": "127.0.0.1",
"Time Stamp": "23/May/2022:22:44:14 -0400",
"HTTP status": "GET /favicon.ico HTTP/1.1",
"Return status": "404 125",
"Browser Info": "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
}
}