我可以使用正则表达式在 ISO EBNF 中定义字符串吗？

Question

我正在使用标准化版本 (ISO/IEC 14997 : 1996(E)) EBNF 来定义我的语法。标准化版本是一种元元语言（它可以自我解析）。

我这样定义一个letter：

letter =  'A' | 'B' | 'C' | 'D' | 'E' | 'H' | 'I' | 'J' | 'K' | 'L' |
'O' | 'P' | 'Q' | 'R' | 'S' | 'V' | 'W' | 'X' | 'Y' | 'Z' | 'a' | 'b'
| 'c' | 'd' | 'e' | 'h' | 'i' | 'j' | 'k' | 'l' | 'o' | 'p' | 'q' |
'r' | 's' | 'v' | 'w' | 'x' | 'y' | 'z' 'F' | 'G' | 'M' | 'N' | 'T' |
'U' | 'f' | 'g' | 'm' | 'n' | 't' | 'u';

我更愿意写得更简单，letter = [a..z]|[A..Z];

我的问题是：以这种形式（使用正则表达式）定义 letter 会破坏 EBNF 属性的自我定义吗？

Answer 1

为此使用特殊序列：

A special-sequence consists of a special-sequence-symbol followed by a (possibly empty) sequence of special- sequence-characters followed by a special-sequence- symbol.

The sequence of symbols represented by a special-sequence is outside the scope of this International Standard. Only the format of a special-sequence is defined in this International Standard. A special-sequence provides a notation for extensions which a user may require.

W3C 广泛使用它。例如：

The formal grammar of XML is given in this specification using a simple Extended Backus-Naur Form (EBNF) notation. Each rule in the grammar defines one symbol, in the form

symbol ::= expression

Symbols are written with an initial capital letter if they are the start symbol of a regular language, otherwise with an initial lowercase letter. Literal strings are quoted.

Within the expression on the right-hand side of a rule, the following expressions are used to match strings of one or more characters:

#xN

    where N is a hexadecimal integer, the expression matches the character whose number (code point) in ISO/IEC 10646 is N. The number of leading zeros in the #xN form is insignificant.
[a-zA-Z], [#xN-#xN]

    matches any Char with a value in the range(s) indicated (inclusive).
[abc], [#xN#xN#xN]

    matches any Char with a value among the characters enumerated. Enumerations and ranges can be mixed in one set of brackets.
[^a-z], [^#xN-#xN]

    matches any Char with a value outside the range indicated.
[^abc], [^#xN#xN#xN]

    matches any Char with a value not among the characters given. Enumerations and ranges of forbidden values can be mixed in one set of brackets.
"string"

    matches a literal string matching that given inside the double quotes.
'string'

    matches a literal string matching that given inside the single quotes.

These symbols may be combined to match more complex patterns as follows, where A and B represent simple expressions:

(expression)

    expression is treated as a unit and may be combined as described in this list.
A?

    matches A or nothing; optional A.
A B

    matches A followed by B. This operator has higher precedence than alternation; thus A B | C D is identical to (A B) | (C D).
A | B

    matches A or B.
A - B

    matches any string that matches A but does not match B.
A+

    matches one or more occurrences of A. Concatenation has higher precedence than alternation; thus A+ | B+ is identical to (A+) | (B+).
A*

    matches zero or more occurrences of A. Concatenation has higher precedence than alternation; thus A* | B* is identical to (A*) | (B*).

Other notations used in the productions are:

/* ... */

    comment.
[ wfc: ... ]

    well-formedness constraint; this identifies by name a constraint on well-formed documents associated with a production.
[ vc: ... ]

    validity constraint; this identifies by name a constraint on valid documents associated with a production.

参考资料

我可以使用正则表达式在 ISO EBNF 中定义字符串吗？

Can I use regular expressions to define strings in ISO EBNF?

dsl

computer-science

bnf

ebnf