可能不存在的正则表达式捕获组

regex capture group which might not be present

我是 运行 一些日志文件的正则表达式。 捕获组应该捕获一些相关字段。 我想知道日志文件是否提到了作业的成功结束。这可以通过是否存在字符串 "Job executed successfully"

来得出结论

到目前为止我的正则表达式: ^Job started at\s'(\d+\s\d+:\d+:\d+:\d+)'\s+orderno\s+-\s+'(\w+)'\s+runno\s+-\s+'(\d+)'[\s\S]+Host1\s'([\w.]+)'\[([\w-]+)\] username '([\w\]+)' - Host2\s'([\w.]+)'\[([\w-]+)\] username '([\w\]+)'[\s\S]+(Job executed successfully)?[\s\S]+Job ended at\s'(\d+\s\d+:\d+:\d+:\d+)'\s+Elapsed time\s\[([\d.]+)sec\]\sCPU usage\s\[([\d.]+)sec]

(我是正则表达式的新手,所以它一点也不完美,需要一些强化)

成功结束的示例日志: 上面的正则表达式只有在“(作业执行成功)?”后面的问号时才有效。已删除,我认为不需要。

Job started at '0902 23:56:00:367' orderno - '0tzh0' runno - '00064' Number of transfers - 1

Host1 'Local'[Windows-LOCAL] username 'xxx\xxx' - Host2 'xxx.xxx.xx'[Unix-SFTP] username 'xxx'

Local host is: xxx - Windows 200x [601] Service Pack 1 build 7601 - Intel64 Family 6 Model 37 Stepping 1, GenuineIntel

********** Starting transfer #1 out of 1 *************** Transfer #1 completed successfully

Job executed successfully. exiting.

Job ended at '0902 23:56:07:138' Elapsed time [7sec] CPU usage [0.15sec]

一个以失败告终的示例日志: 上面的正则表达式可以正常工作。

Job started at '0831 15:26:00:365' orderno - '0tuq5' runno - '00030' Number of transfers - 4

Host1 'Local'[Windows-LOCAL] username 'xxx\xxx' - Host2 'xxx.xxx.xx'[Unix-SFTP] username 'xxx'

Local host is: xxx - Windows 200x [601] Service Pack 1 build 7601 - Intel64 Family 6 Model 37 Stepping 1, GenuineIntel

********** Starting transfer #1 out of 4 *************** Unable to connect to SSH server on 'xxx.xxx.xx': SFTP_Connect : psftp_connect failed : ssh_init: Network error: Connection timed out .

Connection to host sftp.onenet.be could not be established

Job ended at '0831 15:26:21:426'

Elapsed time [21sec] CPU usage [0.0sec]

如果您使用 PCRE,您可以使用令人难以置信的 \Q...\E 序列和一个否定。前瞻:

^\QJob started\E
(?:(?!\QJob ended\E).)+?
^\QJob executed successfully\E

参见 a demo on regex101.com(注意 multilineverbosesingleline 修饰符!)。

如果不是,整个表达式会变得有些不可读:

^Job started(?:(?!Job ended).)+?^Job executed successfully

只需对您的正则表达式进行最少的更改,您就可以使用这个:

^Job started at\s'(\d+\s\d+:\d+:\d+:\d+)'\s+orderno\s+-\s+'(\w+)'\s+runno\s+-\s+'(\d+)'[\s\S]+?Host1\s'([\w.]+)'\[([\w-]+)\] username '([\w\]+)' - Host2\s'([\w.]+)'\[([\w-]+?)\] username '([\w\]+)'[\s\S]+?(?:(Job executed successfully)[\s\S]+?)?Job ended at\s'(\d+\s\d+:\d+:\d+:\d+)'\s+Elapsed time\s\[([\d.]+)sec\]\sCPU usage\s\[([\d.]+)sec]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------^^^-----------------------------------^^

(以上^表示的主要变化)

我还将一些量词转换为惰性量词,这应该会使事情变得更快。

regex101 demo

由于 [\s\S]+ 的贪婪匹配和回溯(从右到左)并测试 (Job executed successfully)?[\s\S]+,您当前的正则表达式将匹配所有内容直到最后,[\s\S]+ 将找到 Job ended 后立即匹配。

在上面的方法中,我们从左到右检查每个字符,直到到达我们需要的部分,即 Job executed successfully 如果它存在。