Python - Suricata / fast.log 中具有挑战性的正则表达式子句
Challenging regex clause in Python - Suricata / fast.log
有正则表达式向导可以提供帮助吗?
我正在尝试让正则表达式解析 Suricata 快速日志。到目前为止,我发现了一个旧的 post 那种作品 here 但想从日志中获取所有数据。
到目前为止,我可以获得时间、日期、源 ip、源端口、目标 ip 和目标端口,但还想获得警报标题、分类和优先级。
日志文件:
03/21/2021-20:24:02.524057 [**] [1:2006380:14] ET POLICY Outgoing Basic Auth Base64 HTTP Password detected unencrypted [**] [Classification: Potential Corporate Privacy Violation] [Priority: 1] {TCP} 192.168.10.14:48820 -> 192.168.10.18:8086
03/21/2021-20:24:23.567546 [**] [1:2014939:5] ET POLICY DNS Query for TOR Hidden Domain .onion Accessible Via TOR [**] [Classification: Potential Corporate Privacy Violation] [Priority: 1] {UDP} 192.168.10.14:49405 -> 192.168.10.1:53
Python 文件:
import re
log_file = open('fast.log','r')
for line in log_file:
r_search = re.search('([0-9/]+)-([0-9:.]+)\s+.*?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})\s+->\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})', line)
print(f'Date - {r_search.group(1)}')
print(f'Time - {r_search.group(2)}')
print(f'Scr IP - {r_search.group(3)}')
print(f'Scr Port - {r_search.group(4)}')
print(f'Dess IP - {r_search.group(5)}')
print(f'Dess Port - {r_search.group(6)}')
print('***********')
log_file.close()
当前输出:
Date - 03/21/2021
Time - 20:24:02.524057
Scr IP - 192.168.10.14
Scr Port - 48820
Dess IP - 192.168.10.18
Dess Port - 8086
***********
想要的输出:
Date - 03/21/2021
Time - 20:24:02.524057
Alert Rule - ET POLICY Outgoing Basic Auth Base64 HTTP Password detected unencrypted
Classification - Potential Corporate Privacy Violation
Priority - 1
Scr IP - 192.168.10.14
Scr Port - 48820
Dess IP - 192.168.10.18
Dess Port - 8086
***********
谢谢!
以下正则表达式模式似乎在这里起作用:
logs = ['03/21/2021-20:24:02.524057 [**] [1:2006380:14] ET POLICY Outgoing Basic Auth Base64 HTTP Password detected unencrypted [**] [Classification: Potential Corporate Privacy Violation] [Priority: 1] {TCP} 192.168.10.14:48820 -> 192.168.10.18:8086', '03/21/2021-20:24:23.567546 [**] [1:2014939:5] ET POLICY DNS Query for TOR Hidden Domain .onion Accessible Via TOR [**] [Classification: Potential Corporate Privacy Violation] [Priority: 1] {UDP} 192.168.10.14:49405 -> 192.168.10.1:53']
for log in logs:
matches = re.findall(r'^(.*?)-(\S+)\s+\[.*?\]\s+\[.*?\]\s+(.*?)\s+\[.*?\]\s+\[(.*?)\]\s+\[(.*?)\].*?(\d+(?:\.\d+)*):(\d+)\s+->\s+(\d+(?:\.\d+)*):(\d+).*$', log)
print(matches)
这会打印:
[('03/21/2021',
'20:24:02.524057',
'ET POLICY Outgoing Basic Auth Base64 HTTP Password detected unencrypted',
'Classification: Potential Corporate Privacy Violation',
'Priority: 1',
'192.168.10.14',
'48820',
'192.168.10.18',
'8086')]
[('03/21/2021',
'20:24:23.567546',
'ET POLICY DNS Query for TOR Hidden Domain .onion Accessible Via TOR',
'Classification: Potential Corporate Privacy Violation',
'Priority: 1',
'192.168.10.14',
'49405',
'192.168.10.1',
'53')]
有正则表达式向导可以提供帮助吗?
我正在尝试让正则表达式解析 Suricata 快速日志。到目前为止,我发现了一个旧的 post 那种作品 here 但想从日志中获取所有数据。
到目前为止,我可以获得时间、日期、源 ip、源端口、目标 ip 和目标端口,但还想获得警报标题、分类和优先级。
日志文件:
03/21/2021-20:24:02.524057 [**] [1:2006380:14] ET POLICY Outgoing Basic Auth Base64 HTTP Password detected unencrypted [**] [Classification: Potential Corporate Privacy Violation] [Priority: 1] {TCP} 192.168.10.14:48820 -> 192.168.10.18:8086
03/21/2021-20:24:23.567546 [**] [1:2014939:5] ET POLICY DNS Query for TOR Hidden Domain .onion Accessible Via TOR [**] [Classification: Potential Corporate Privacy Violation] [Priority: 1] {UDP} 192.168.10.14:49405 -> 192.168.10.1:53
Python 文件:
import re
log_file = open('fast.log','r')
for line in log_file:
r_search = re.search('([0-9/]+)-([0-9:.]+)\s+.*?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})\s+->\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})', line)
print(f'Date - {r_search.group(1)}')
print(f'Time - {r_search.group(2)}')
print(f'Scr IP - {r_search.group(3)}')
print(f'Scr Port - {r_search.group(4)}')
print(f'Dess IP - {r_search.group(5)}')
print(f'Dess Port - {r_search.group(6)}')
print('***********')
log_file.close()
当前输出:
Date - 03/21/2021
Time - 20:24:02.524057
Scr IP - 192.168.10.14
Scr Port - 48820
Dess IP - 192.168.10.18
Dess Port - 8086
***********
想要的输出:
Date - 03/21/2021
Time - 20:24:02.524057
Alert Rule - ET POLICY Outgoing Basic Auth Base64 HTTP Password detected unencrypted
Classification - Potential Corporate Privacy Violation
Priority - 1
Scr IP - 192.168.10.14
Scr Port - 48820
Dess IP - 192.168.10.18
Dess Port - 8086
***********
谢谢!
以下正则表达式模式似乎在这里起作用:
logs = ['03/21/2021-20:24:02.524057 [**] [1:2006380:14] ET POLICY Outgoing Basic Auth Base64 HTTP Password detected unencrypted [**] [Classification: Potential Corporate Privacy Violation] [Priority: 1] {TCP} 192.168.10.14:48820 -> 192.168.10.18:8086', '03/21/2021-20:24:23.567546 [**] [1:2014939:5] ET POLICY DNS Query for TOR Hidden Domain .onion Accessible Via TOR [**] [Classification: Potential Corporate Privacy Violation] [Priority: 1] {UDP} 192.168.10.14:49405 -> 192.168.10.1:53']
for log in logs:
matches = re.findall(r'^(.*?)-(\S+)\s+\[.*?\]\s+\[.*?\]\s+(.*?)\s+\[.*?\]\s+\[(.*?)\]\s+\[(.*?)\].*?(\d+(?:\.\d+)*):(\d+)\s+->\s+(\d+(?:\.\d+)*):(\d+).*$', log)
print(matches)
这会打印:
[('03/21/2021',
'20:24:02.524057',
'ET POLICY Outgoing Basic Auth Base64 HTTP Password detected unencrypted',
'Classification: Potential Corporate Privacy Violation',
'Priority: 1',
'192.168.10.14',
'48820',
'192.168.10.18',
'8086')]
[('03/21/2021',
'20:24:23.567546',
'ET POLICY DNS Query for TOR Hidden Domain .onion Accessible Via TOR',
'Classification: Potential Corporate Privacy Violation',
'Priority: 1',
'192.168.10.14',
'49405',
'192.168.10.1',
'53')]