使用 python 将 access.log 转换为 JSON 格式

Convert access.log to JSON format using python

**这是我的 python 代码,我正在尝试转换 NGINX 日志。

我正在从 access.log 文件中读取日志并使用正则表达式将其转换为 JSON 格式,我需要将这些日志上传到 Elasticseach。还请指导相关的。我对两者都是新手**

 import json 
 import re

 i = 0
 result = {}

with open('access.log') as f:
  lines = f.readlines()


regex = '([(\d\.)]+) - - \[(.*?)\] "(.*?)" (\d+) - "(.*?)" "(.*?)"'

for line in lines:

  r = re.match(regex,line)

  if len(r) >= 6:
    result[i] = {'IP address': r[0], 'Time Stamp': r[1], 'HTTP status': r[2], 'Return status': 
                 r[3], 'Browser Info': r[4]}
    i += 1
 print(result) 

with open('data.json', 'w') as fp:
 json.dump(result, fp)

我遇到了以下错误

Traceback (most recent call last):
   File "/home/zain/Downloads/stack.py", line 17, in <module>
    if len(r) >= 6:
TypeError: object of type 'NoneType' has no len()

这些是日志格式

127.0.0.1 - - [23/May/2022:22:44:14 -0400] "GET / HTTP/1.1" 200 3437 "-" "Mozilla/5.0   (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
127.0.0.1 - - [23/May/2022:22:44:14 -0400] "GET /icons/openlogo-75.png HTTP/1.1" 404 125 "http://localhost/" "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
127.0.0.1 - - [23/May/2022:22:44:14 -0400] "GET /favicon.ico HTTP/1.1" 404 125 "http://localhost/" "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"

预期输出为

IP Address: 127.0.0.1 Time Stamp: 23/May/2022:22:44:14  HTTP Status: "GET / HTTP/1.1" Return Status: 200 3437  Browser Info: "Mozilla/5.0   (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"

我从 this code 得到了提示。相信以下应该做到这一点:

import json 
import re

i = 0
result = {}

with open('access.log') as f:
    lines = f.readlines()

regex = '(?P<ipaddress>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - - \[(?P<dateandtime>.*)\] \"(?P<httpstatus>(GET|POST) .+ HTTP\/1\.1)\" (?P<returnstatus>\d{3} \d+) (\".*\")(?P<browserinfo>.*)\"'

for line in lines:

    r = re.match(regex,line)
    
    if r != None:
        result[i] = {'IP address': r.group('ipaddress'), 'Time Stamp': r.group('dateandtime'), 
                     'HTTP status': r.group('httpstatus'), 'Return status': 
                     r.group('returnstatus'), 'Browser Info': r.group('browserinfo')}
        i += 1
    
print(result)

with open('data.json', 'w') as fp:
    json.dump(result, fp) 

结果(print(json.dumps(result, sort_keys=False, indent=4))):

{
    "0": {
        "IP address": "127.0.0.1",
        "Time Stamp": "23/May/2022:22:44:14 -0400",
        "HTTP status": "GET / HTTP/1.1",
        "Return status": "200 3437",
        "Browser Info": "Mozilla/5.0   (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
    },
    "1": {
        "IP address": "127.0.0.1",
        "Time Stamp": "23/May/2022:22:44:14 -0400",
        "HTTP status": "GET /icons/openlogo-75.png HTTP/1.1",
        "Return status": "404 125",
        "Browser Info": "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
    },
    "2": {
        "IP address": "127.0.0.1",
        "Time Stamp": "23/May/2022:22:44:14 -0400",
        "HTTP status": "GET /favicon.ico HTTP/1.1",
        "Return status": "404 125",
        "Browser Info": "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
    }
}