Python 系统日志的正则表达式解析

Python regex parsing of syslog

我有一个这种格式的系统日志文件。

Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: Info: MODULE: Startup MESSAGE: Application Version: 8.44.0
Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: Info: MODULE: Startup MESSAGE: Run on system: host
Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: Info: MODULE: Startup MESSAGE: Running as user: SYSTEM
Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: Info: MODULE: Startup MESSAGE: User has admin rights: yes
Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: Info: MODULE: Startup MESSAGE: Start Time: 2016-03-07 13:44:55
Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: Info: MODULE: Startup MESSAGE: IP Address: 10.10.10.10
Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: Info: MODULE: Startup MESSAGE: CPU Count: 1
Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: Info: MODULE: Startup MESSAGE: System Type: Server
Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: Info: MODULE: Startup MESSAGE: System Uptime: 18.10 days
Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: MODULE: InitHead MESSAGE: => Reading signature and hash files ...
Mar  7 13:44:55 host.domain.example.net/10.10.10.10 Application: Notice: MODULE: Init MESSAGE: file-type-signatures.cfg initialized with 80 values.
Mar  7 13:44:56 host.domain.example.net/10.10.10.10 Application: Notice: MODULE: Init MESSAGE: signatures/filename-characteristics.dat initialized with 2778 values.
Mar  7 13:44:56 host.domain.example.net/10.10.10.10 Application: Notice: MODULE: Init MESSAGE: signatures/keywords.dat initialized with 63 values.
Some logs ...
Mar  7 17:42:08 host.domain.example.net/10.10.10.10 Application: Results: MODULE: Report MESSAGE: Results: 0 Alarms, 0 Warnings, 131 Notices, 2 Errors
Mar  7 17:42:08 host.domain.example.net/10.10.10.10 Application: End: MODULE: Report MESSAGE: Begin Time: 2016-03-07 13:44:55
Mar  7 17:42:08 host.domain.example.net/10.10.10.10 Application: End: MODULE: Report MESSAGE: End Time: 2016-03-07 17:42:07
Mar  7 17:42:08 host.domain.example.net/10.10.10.10 Application: End: MODULE: Report MESSAGE: Scan took 3 hours 57 mins 11 secs

如何提取"Application Version"、"Run on system"、"User has admin rights"、"Start Time"、"IP Address"、"CPU Count"、"System Type" , "System Uptime", "End Time", "Alarms", "Warnings", "Notices", "Errors" 使用 Python?

实际上我是 Python 的新手,所以我真的不知道该怎么做。但我设法创建了一个名为 finder()

的函数
def finder(fname,str):
    with open(fname, "r") as hand:
        for line in hand:
            line = line.rstrip()
            if re.search(str, line):
              return line

为了获取带有 IP 地址的行,我将使用

来调用它
 finder("file path","MESSAGE: IP Address")

这将打印整行,我需要帮助才能仅获取 ipaddress 部分, 以及其他行中的其他信息。

请在检查代码之前检查下面 links。对你有很大帮助。

  1. re module - 使用的模块。这个 link 给出了很好的解释和例子
  2. Python Regex Tester - 在这里您可以测试您的正则表达式和 Python 可用的正则表达式相关函数。我用同样的方法来测试我在下面使用的正则表达式:

内联注释代码

import re
fo = open("out.txt", "r")
#The information we need to collect.
info_list =["Application Version", "Run on system", "User has admin rights", "Start Time", "IP Address", "CPU Count", "System Type", "System Uptime", "End Time", "Results","Begin Time"]
for line in fo:
    for srch_pat in info_list:
        #First will search if the inforamtion we need is present in line or not.
        if srch_pat in line:
            #This will get the exact information. For e.g, version number in case of Application Version
            regex = re.compile(r'MESSAGE:\s+%s:\s+(.*)'%srch_pat)
            m = regex.search(line)

            if "Results" in srch_pat:
                #For result, this regex will get the required info
                result_regex = re.search(r'(\d+)\s+Alarms,\s+(\d+)\s+Warnings,\s+(\d+)\s+Notices,\s+(\d+)\s+Errors',m.group(1))
                print 'Alarms - ',result_regex.group(1)
                print 'Warnings - ',result_regex.group(2)
                print 'Notices - ',result_regex.group(3)
                print 'Errors - ',result_regex.group(4)
            else:
                print srch_pat,'-',m.group(1)

输出

C:\Users\dinesh_pundkar\Desktop>python a.py
Application Version - 8.44.0
Run on system - host
User has admin rights - yes
Start Time - 2016-03-07 13:44:55
IP Address - 10.10.10.10
CPU Count - 1
System Type - Server
System Uptime - 18.10 days
Alarms -  0
Warnings -  0
Notices -  131
Errors -  2
Begin Time - 2016-03-07 13:44:55
End Time - 2016-03-07 17:42:07