如何组合正则表达式以匹配带分隔符和不带分隔符的字符串？

Question

我有像下面提到的 Input 这样的字符串需要处理并转换为 name/value 对如下所示：

输入： FOO = BAR=BAZ 输出： name='FOO', value='BAR=BAZ'

输入： FOO = BAR 输出： name='FOO', value='BAR'

输入： FOO = 输出： name='FOO', value=''

输入： = BAR=BAZ 输出： name='', value='BAR:BAZ'

输入： = BAR 输出： name='', value='BAR'

输入： FOO 输出： name='FOO', value=''

请注意，分隔符是 = 或 :。没有分隔符也是可以的。

以下代码涵盖了除最后一个以外的所有上述情况，

regexp {^\s*(.*?)\s*[=:]\s*(.*?)\s*$} $setting -> name value

if {![info exists name]} {
    set name {}
}

if {![info exists value]} {
    set value {}
}

puts "name='$name', value='$value'"

为此returns

输出： name='', value=''

而不是

输出： name='FOO', value=''

最后一种情况可以用下面的正则表达式来解决：

regexp {^\s*(.*?)\s*$} $setting -> name value

如何将这些正则表达式组合成一个涵盖所有情况的正则表达式？

Answer 1

How those regular expressions could be combined to just have a single regular expression covering all the cases?

前者已经包含后者 :) 但是您的更广泛的正则表达式无法匹配最后一种情况 (FOO)，因为它根本不包含定界符。观察 [regexp] 的结果，这将是 0。

考虑以下几点：

 ^\s*([^=:]*)\s*[=:]?\s*(.*)\s*$

这应该涵盖所有情况，甚至是仅值（仅 RHS）情况。

Answer 2

我不清楚你为什么坚持用 regexp 这样做。当您的正则表达式变得太复杂时，可能是时候使用不同的方法了。假设你的字符串中没有任何 NUL 字符，你可以这样做：

lassign [split [regsub {\s*[:=]\s*} [string trim $setting] [=10=]] [=10=]] name value

字符串 trim 去掉了周围的任何白色 space。然后分隔符和任何周围的白色 space 被替换为 NUL 字符。最后，结果在 NUL 字符上一分为二，并将这两部分分配给名称和值变量。

根据我的测量，此方法的速度是正则表达式变体的两倍多。

Answer 3

set tests {{FOO = BAR=BAZ} {FOO = BAR} {FOO =} {= BAR=BAZ} {= BAR} FOO}
foreach test $tests {
    # expanded regex with commentary
    regexp {(?x)
        (.*?)               # the left-hand side, may be empty
        (?:                 # start a group, but do not capture it
            \s*[:=]\s*      # the separator
            (.*)            # the value
        )?                  # end the group, and it is optional
        $                   # until the end of line: this is required because the
                            # whole regex is non-greedy due to the first
                            # quantifier being non-greedy. Without the anchor,
                            # the 2nd capture will always be the empty string.
    } $test -> var value

    puts "name='$var', value='$value'"
}

产出

name='FOO', value='BAR=BAZ'
name='FOO', value='BAR'
name='FOO', value=''
name='', value='BAR=BAZ'
name='', value='BAR'
name='FOO', value=''

如何组合正则表达式以匹配带分隔符和不带分隔符的字符串？

How to combine regular expressions for matching strings with and without delimiters?

regex

tcl