可能不存在的正则表达式捕获组
regex capture group which might not be present
我是 运行 一些日志文件的正则表达式。
捕获组应该捕获一些相关字段。
我想知道日志文件是否提到了作业的成功结束。这可以通过是否存在字符串 "Job executed successfully"
来得出结论
到目前为止我的正则表达式:
^Job started at\s'(\d+\s\d+:\d+:\d+:\d+)'\s+orderno\s+-\s+'(\w+)'\s+runno\s+-\s+'(\d+)'[\s\S]+Host1\s'([\w.]+)'\[([\w-]+)\] username '([\w\]+)' - Host2\s'([\w.]+)'\[([\w-]+)\] username '([\w\]+)'[\s\S]+(Job executed successfully)?[\s\S]+Job ended at\s'(\d+\s\d+:\d+:\d+:\d+)'\s+Elapsed time\s\[([\d.]+)sec\]\sCPU usage\s\[([\d.]+)sec]
(我是正则表达式的新手,所以它一点也不完美,需要一些强化)
成功结束的示例日志:
上面的正则表达式只有在“(作业执行成功)?”后面的问号时才有效。已删除,我认为不需要。
Job started at '0902 23:56:00:367' orderno - '0tzh0' runno - '00064'
Number of transfers - 1
Host1 'Local'[Windows-LOCAL] username 'xxx\xxx' - Host2 'xxx.xxx.xx'[Unix-SFTP] username 'xxx'
Local host is: xxx - Windows 200x [601] Service Pack 1 build 7601 - Intel64 Family 6 Model 37 Stepping 1, GenuineIntel
********** Starting transfer #1 out of 1 ***************
Transfer #1 completed successfully
Job executed successfully. exiting.
Job ended at '0902 23:56:07:138'
Elapsed time [7sec] CPU usage [0.15sec]
一个以失败告终的示例日志:
上面的正则表达式可以正常工作。
Job started at '0831 15:26:00:365' orderno - '0tuq5' runno - '00030'
Number of transfers - 4
Host1 'Local'[Windows-LOCAL] username 'xxx\xxx' - Host2 'xxx.xxx.xx'[Unix-SFTP] username 'xxx'
Local host is: xxx - Windows 200x [601] Service Pack 1 build 7601 - Intel64 Family 6 Model 37 Stepping 1, GenuineIntel
********** Starting transfer #1 out of 4 ***************
Unable to connect to SSH server on 'xxx.xxx.xx': SFTP_Connect : psftp_connect failed : ssh_init: Network error: Connection timed out
.
Connection to host sftp.onenet.be could not be established
Job ended at '0831 15:26:21:426'
Elapsed time [21sec] CPU usage [0.0sec]
如果您使用 PCRE
,您可以使用令人难以置信的 \Q...\E
序列和一个否定。前瞻:
^\QJob started\E
(?:(?!\QJob ended\E).)+?
^\QJob executed successfully\E
参见 a demo on regex101.com(注意 multiline
、verbose
和 singleline
修饰符!)。
如果不是,整个表达式会变得有些不可读:
^Job started(?:(?!Job ended).)+?^Job executed successfully
只需对您的正则表达式进行最少的更改,您就可以使用这个:
^Job started at\s'(\d+\s\d+:\d+:\d+:\d+)'\s+orderno\s+-\s+'(\w+)'\s+runno\s+-\s+'(\d+)'[\s\S]+?Host1\s'([\w.]+)'\[([\w-]+)\] username '([\w\]+)' - Host2\s'([\w.]+)'\[([\w-]+?)\] username '([\w\]+)'[\s\S]+?(?:(Job executed successfully)[\s\S]+?)?Job ended at\s'(\d+\s\d+:\d+:\d+:\d+)'\s+Elapsed time\s\[([\d.]+)sec\]\sCPU usage\s\[([\d.]+)sec]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------^^^-----------------------------------^^
(以上^
表示的主要变化)
我还将一些量词转换为惰性量词,这应该会使事情变得更快。
由于 [\s\S]+
的贪婪匹配和回溯(从右到左)并测试 (Job executed successfully)?[\s\S]+
,您当前的正则表达式将匹配所有内容直到最后,[\s\S]+
将找到 Job ended
后立即匹配。
在上面的方法中,我们从左到右检查每个字符,直到到达我们需要的部分,即 Job executed successfully
如果它存在。
我是 运行 一些日志文件的正则表达式。 捕获组应该捕获一些相关字段。 我想知道日志文件是否提到了作业的成功结束。这可以通过是否存在字符串 "Job executed successfully"
来得出结论到目前为止我的正则表达式:
^Job started at\s'(\d+\s\d+:\d+:\d+:\d+)'\s+orderno\s+-\s+'(\w+)'\s+runno\s+-\s+'(\d+)'[\s\S]+Host1\s'([\w.]+)'\[([\w-]+)\] username '([\w\]+)' - Host2\s'([\w.]+)'\[([\w-]+)\] username '([\w\]+)'[\s\S]+(Job executed successfully)?[\s\S]+Job ended at\s'(\d+\s\d+:\d+:\d+:\d+)'\s+Elapsed time\s\[([\d.]+)sec\]\sCPU usage\s\[([\d.]+)sec]
(我是正则表达式的新手,所以它一点也不完美,需要一些强化)
成功结束的示例日志: 上面的正则表达式只有在“(作业执行成功)?”后面的问号时才有效。已删除,我认为不需要。
Job started at '0902 23:56:00:367' orderno - '0tzh0' runno - '00064' Number of transfers - 1
Host1 'Local'[Windows-LOCAL] username 'xxx\xxx' - Host2 'xxx.xxx.xx'[Unix-SFTP] username 'xxx'
Local host is: xxx - Windows 200x [601] Service Pack 1 build 7601 - Intel64 Family 6 Model 37 Stepping 1, GenuineIntel
********** Starting transfer #1 out of 1 *************** Transfer #1 completed successfully
Job executed successfully. exiting.
Job ended at '0902 23:56:07:138' Elapsed time [7sec] CPU usage [0.15sec]
一个以失败告终的示例日志: 上面的正则表达式可以正常工作。
Job started at '0831 15:26:00:365' orderno - '0tuq5' runno - '00030' Number of transfers - 4
Host1 'Local'[Windows-LOCAL] username 'xxx\xxx' - Host2 'xxx.xxx.xx'[Unix-SFTP] username 'xxx'
Local host is: xxx - Windows 200x [601] Service Pack 1 build 7601 - Intel64 Family 6 Model 37 Stepping 1, GenuineIntel
********** Starting transfer #1 out of 4 *************** Unable to connect to SSH server on 'xxx.xxx.xx': SFTP_Connect : psftp_connect failed : ssh_init: Network error: Connection timed out .
Connection to host sftp.onenet.be could not be established
Job ended at '0831 15:26:21:426'
Elapsed time [21sec] CPU usage [0.0sec]
如果您使用 PCRE
,您可以使用令人难以置信的 \Q...\E
序列和一个否定。前瞻:
^\QJob started\E
(?:(?!\QJob ended\E).)+?
^\QJob executed successfully\E
参见 a demo on regex101.com(注意 multiline
、verbose
和 singleline
修饰符!)。
如果不是,整个表达式会变得有些不可读:
^Job started(?:(?!Job ended).)+?^Job executed successfully
只需对您的正则表达式进行最少的更改,您就可以使用这个:
^Job started at\s'(\d+\s\d+:\d+:\d+:\d+)'\s+orderno\s+-\s+'(\w+)'\s+runno\s+-\s+'(\d+)'[\s\S]+?Host1\s'([\w.]+)'\[([\w-]+)\] username '([\w\]+)' - Host2\s'([\w.]+)'\[([\w-]+?)\] username '([\w\]+)'[\s\S]+?(?:(Job executed successfully)[\s\S]+?)?Job ended at\s'(\d+\s\d+:\d+:\d+:\d+)'\s+Elapsed time\s\[([\d.]+)sec\]\sCPU usage\s\[([\d.]+)sec]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------^^^-----------------------------------^^
(以上^
表示的主要变化)
我还将一些量词转换为惰性量词,这应该会使事情变得更快。
由于 [\s\S]+
的贪婪匹配和回溯(从右到左)并测试 (Job executed successfully)?[\s\S]+
,您当前的正则表达式将匹配所有内容直到最后,[\s\S]+
将找到 Job ended
后立即匹配。
在上面的方法中,我们从左到右检查每个字符,直到到达我们需要的部分,即 Job executed successfully
如果它存在。