正则表达式中的运算符优先级
Operator precedence in regular expressions
当 Oracle 的正则表达式不包含括号时,默认的运算符优先级是什么?
例如给定
H|ha+
会被评估为 H|h
然后连接到 a
就像 ((H|h)a)
一样,还是 H
与 ha
交替出现(H|(ha))
?
另外,+
什么时候开始等等?
鉴于 Oracle doc:
Table 4-2 lists the list of metacharacters supported for use in regular expressions passed to SQL regular expression functions and conditions. These metacharacters conform to the POSIX standard; any differences in behavior from the standard are noted in the "Description" column.
并查看 table 中的 |
值:
The expression a|b matches character a or character b.
再看看 POSIX doc:
Operator precedence
The order of precedence for of operators is as follows:
Collation-related bracket symbols [==] [::] [..]
Escaped characters \
Character set (bracket expression) []
Grouping ()
Single-character-ERE duplication * + ? {m,n}
Concatenation
Anchoring ^$
Alternation |
我会说 H|ha+
与 (?:H|ha+)
相同。
使用捕获组来演示计算顺序,正则表达式 H|ha+
等同于以下内容:
(H|(h(a+)))
这是因为优先规则(如下所示)是按从最高优先级(编号最低)到最低优先级(编号最高)的顺序应用的:
规则 5 → (a+)
+
与 a
分组,因为此运算符作用于前面的单个字符,back-reference,组(Oracle 说法中的 "marked sub-expression"),或括号表达式(字符 class)。
规则 6 → (h(a+))
h
然后与前面步骤中的组连接。
规则 8 → (H|(h(a+)))
然后 H
与前面步骤中的组交替。
第 9.4.8 of the POSIX docs for regular expressions 部分的优先顺序 table(似乎没有正式的 Oracle table):
+---+----------------------------------------------------------+
| | ERE Precedence (from high to low) |
+---+----------------------------------------------------------+
| 1 | Collation-related bracket symbols | [==] [::] [..] |
| 2 | Escaped characters | \<special character> |
| 3 | Bracket expression | [] |
| 4 | Grouping | () |
| 5 | Single-character-ERE duplication | * + ? {m,n} |
| 6 | Concatenation | |
| 7 | Anchoring | ^ $ |
| 8 | Alternation | | |
+---+-----------------------------------+----------------------+
上面的 table 用于扩展正则表达式。对于基本正则表达式,请参阅 9.3.7。
当 Oracle 的正则表达式不包含括号时,默认的运算符优先级是什么?
例如给定
H|ha+
会被评估为 H|h
然后连接到 a
就像 ((H|h)a)
一样,还是 H
与 ha
交替出现(H|(ha))
?
另外,+
什么时候开始等等?
鉴于 Oracle doc:
Table 4-2 lists the list of metacharacters supported for use in regular expressions passed to SQL regular expression functions and conditions. These metacharacters conform to the POSIX standard; any differences in behavior from the standard are noted in the "Description" column.
并查看 table 中的 |
值:
The expression a|b matches character a or character b.
再看看 POSIX doc:
Operator precedence The order of precedence for of operators is as follows:
Collation-related bracket symbols [==] [::] [..]
Escaped characters \
Character set (bracket expression) []
Grouping ()
Single-character-ERE duplication * + ? {m,n}
Concatenation
Anchoring ^$
Alternation |
我会说 H|ha+
与 (?:H|ha+)
相同。
使用捕获组来演示计算顺序,正则表达式 H|ha+
等同于以下内容:
(H|(h(a+)))
这是因为优先规则(如下所示)是按从最高优先级(编号最低)到最低优先级(编号最高)的顺序应用的:
规则 5 →
(a+)
+
与a
分组,因为此运算符作用于前面的单个字符,back-reference,组(Oracle 说法中的 "marked sub-expression"),或括号表达式(字符 class)。规则 6 →
(h(a+))
h
然后与前面步骤中的组连接。规则 8 →
(H|(h(a+)))
然后H
与前面步骤中的组交替。
第 9.4.8 of the POSIX docs for regular expressions 部分的优先顺序 table(似乎没有正式的 Oracle table):
+---+----------------------------------------------------------+
| | ERE Precedence (from high to low) |
+---+----------------------------------------------------------+
| 1 | Collation-related bracket symbols | [==] [::] [..] |
| 2 | Escaped characters | \<special character> |
| 3 | Bracket expression | [] |
| 4 | Grouping | () |
| 5 | Single-character-ERE duplication | * + ? {m,n} |
| 6 | Concatenation | |
| 7 | Anchoring | ^ $ |
| 8 | Alternation | | |
+---+-----------------------------------+----------------------+
上面的 table 用于扩展正则表达式。对于基本正则表达式,请参阅 9.3.7。