使用正则表达式提取具有特定模式的第一行

Question

我有一个字符串

set text {show log

===============================================================================
Event Log 
===============================================================================
Description : Default System Log
Log contents  [size=500   next event=7  (not wrapped)]

6 2020/05/22 12:36:05.81 UTC CRITICAL: IOM #2001 Base IOM
"IOM:1>some text here routes "

5 2020/05/22 12:36:05.52 UTC CRITICAL: IOM #2001 Base IOM
"IOM:2>some other text routes "

4 2020/05/22 12:36:05.10 UTC MINOR: abc #2001 some text here also 222 def "

3 2020/05/22 12:36:05.09 UTC WARNING: abc #2011 some text here 111 ghj"

1 2020/05/22 12:35:47.60 UTC INDETERMINATE: ghe #2010 a,b, c="7" "
}

我想在 tcl 中使用 regexp 提取以 "IOM:" 开头的第一行，即

IOM:1>some text here routes

但是实施不起作用，有人可以帮忙吗？

regexp -nocase -lineanchor -- {^\s*(IOM:)\s*\s*(.*?)routes$} $line match tag value

Answer 1

您可以使用

regexp -nocase -- {(?n)^"IOM:.*} $text match
regexp -nocase -line -- {^"IOM:.*} $text match

见Tcl demo

详情

(?n) -（与 -line 选项相同）启用换行敏感模式，以便 . 无法匹配换行符（参见 Tcl regex docs：如果指定了换行敏感匹配，. 和使用 ^ 的括号表达式将永远不会匹配换行符（因此匹配永远不会跨越换行符，除非 RE 明确安排它）和 ^ 和 $ 除了分别匹配字符串的开头和结尾之外，还会分别匹配换行符前后的空字符串)
^ - 行首
"IOM: - "IOM: 字符串
.* - 该行的其余部分到结尾。

Answer 2

除了@Wiktor 的出色回答之外，您可能还想遍历匹配项：

set re {^\s*"(IOM):(.*)routes.*$} foreach {match tag value} [regexp -all -inline -nocase -line -- $re $text] { puts [list $tag $value] }

IOM {1>some text here } IOM {2>some other text }

我看到你的正则表达式中有一个非贪婪的部分。与其他语言相比，Tcl 正则表达式引擎有点奇怪：正则表达式中的 第一个量词 为整个正则表达式 .[=13 设置贪婪度 =]

set re {^\s*(IOM:)\s*\s*(.*?)routes$}   ; # whole regex is greedy
set re {^\s*?(IOM:)\s*\s*(.*?)routes$}  ; # whole regex in non-greedy
# .........^^

使用正则表达式提取具有特定模式的第一行

extract 1st line with specific pattern using regexp

regex

tcl