单对(嵌套)括号的正则表达式,但不包括交错的括号?
Regular expression for single pairs of (nested) brackets, BUT excluding inter-leaved ones?
在 SO 和其他地方有很多匹配圆括号或括号对的解决方案,但是 none 我可以找到或想出排除交错项的解决方案。这个挑战的解决方案是什么:
Write a regular expression for a string containing any number of X
and single pairs of < > and { } which may be nested but not
inter-leaved. For example these strings are allowed:
XXX<XX{X}XXX>X
X{X}X<X>X{X}X<X>X
But these are not allowed:
XXX<X<XX>>XX
XX<XX{XX>XX}XX
这完全可以用正则表达式(有限自动机)来完成吗?不需要下推自动机吗?
除了不交错外,请注意对单对的要求,这意味着嵌套可能只有一层深,具有不同的支架类型,如图所示。
没有偏好使用哪个正则表达式 engine/language。
使用
^[^<>{}]*(?:(?:<[^<>]*>|{[^{}]*})+[^<>{}]*)*$
参见regex proof。
解释
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
[^<>{}]* any character except: '<', '>', '{', '}'
(0 or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
(?: group, but do not capture (1 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
[^<>]* any character except: '<', '>' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
{ '{'
--------------------------------------------------------------------------------
[^{}]* any character except: '{', '}' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
} '}'
--------------------------------------------------------------------------------
)+ end of grouping
--------------------------------------------------------------------------------
[^<>{}]* any character except: '<', '>', '{', '}'
(0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
对于 [^<>{}]
,令 X
为 shorthand。那么需要的正则表达式为:
(X|<(X|{X*})*>|{(X|<X*>)*})*
或者,完整地写出来,
([^<>{}]|<([^<>{}]|{[^<>{}]*})*>|{([^<>{}]|<[^<>{}]*>)*})*
在 SO 和其他地方有很多匹配圆括号或括号对的解决方案,但是 none 我可以找到或想出排除交错项的解决方案。这个挑战的解决方案是什么:
Write a regular expression for a string containing any number of X
and single pairs of < > and { } which may be nested but not
inter-leaved. For example these strings are allowed:
XXX<XX{X}XXX>X
X{X}X<X>X{X}X<X>X
But these are not allowed:
XXX<X<XX>>XX
XX<XX{XX>XX}XX
这完全可以用正则表达式(有限自动机)来完成吗?不需要下推自动机吗?
除了不交错外,请注意对单对的要求,这意味着嵌套可能只有一层深,具有不同的支架类型,如图所示。
没有偏好使用哪个正则表达式 engine/language。
使用
^[^<>{}]*(?:(?:<[^<>]*>|{[^{}]*})+[^<>{}]*)*$
参见regex proof。
解释
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
[^<>{}]* any character except: '<', '>', '{', '}'
(0 or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
(?: group, but do not capture (1 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
[^<>]* any character except: '<', '>' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
{ '{'
--------------------------------------------------------------------------------
[^{}]* any character except: '{', '}' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
} '}'
--------------------------------------------------------------------------------
)+ end of grouping
--------------------------------------------------------------------------------
[^<>{}]* any character except: '<', '>', '{', '}'
(0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
对于 [^<>{}]
,令 X
为 shorthand。那么需要的正则表达式为:
(X|<(X|{X*})*>|{(X|<X*>)*})*
或者,完整地写出来,
([^<>{}]|<([^<>{}]|{[^<>{}]*})*>|{([^<>{}]|<[^<>{}]*>)*})*