正则表达式中的运算符优先级

Operator precedence in regular expressions

当 Oracle 的正则表达式不包含括号时,默认的运算符优先级是什么?

例如给定

 H|ha+

会被评估为 H|h 然后连接到 a 就像 ((H|h)a) 一样,还是 Hha 交替出现(H|(ha))?

另外,+什么时候开始等等?

鉴于 Oracle doc:

Table 4-2 lists the list of metacharacters supported for use in regular expressions passed to SQL regular expression functions and conditions. These metacharacters conform to the POSIX standard; any differences in behavior from the standard are noted in the "Description" column.

并查看 table 中的 | 值:

The expression a|b matches character a or character b.

再看看 POSIX doc:

Operator precedence The order of precedence for of operators is as follows:

  1. Collation-related bracket symbols [==] [::] [..]

  2. Escaped characters \

  3. Character set (bracket expression) []

  4. Grouping ()

  5. Single-character-ERE duplication * + ? {m,n}

  6. Concatenation

  7. Anchoring ^$

  8. Alternation |

我会说 H|ha+(?:H|ha+) 相同。

使用捕获组来演示计算顺序,正则表达式 H|ha+ 等同于以下内容:

(H|(h(a+)))

这是因为优先规则(如下所示)是按从最高优先级(编号最低)到最低优先级(编号最高)的顺序应用的:

  • 规则 5 → (a+) +a 分组,因为此运算符作用于前面的单个字符,back-reference,组(Oracle 说法中的 "marked sub-expression"),或括号表达式(字符 class)。

  • 规则 6 → (h(a+)) h 然后与前面步骤中的组连接。

  • 规则 8 → (H|(h(a+))) 然后 H 与前面步骤中的组交替。



9.4.8 of the POSIX docs for regular expressions 部分的优先顺序 table(似乎没有正式的 Oracle table):

+---+----------------------------------------------------------+
|   |             ERE Precedence (from high to low)            |
+---+----------------------------------------------------------+
| 1 | Collation-related bracket symbols | [==] [::] [..]       |
| 2 | Escaped characters                | \<special character> |
| 3 | Bracket expression                | []                   |
| 4 | Grouping                          | ()                   |
| 5 | Single-character-ERE duplication  | * + ? {m,n}          |
| 6 | Concatenation                     |                      |
| 7 | Anchoring                         | ^ $                  |
| 8 | Alternation                       | |                    |
+---+-----------------------------------+----------------------+

上面的 table 用于扩展正则表达式。对于基本正则表达式,请参阅 9.3.7