我可以使用正则表达式在 ISO EBNF 中定义字符串吗?
Can I use regular expressions to define strings in ISO EBNF?
我正在使用标准化版本 (ISO/IEC 14997 : 1996(E)) EBNF 来定义我的语法。
标准化版本是一种元元语言(它可以自我解析)。
我这样定义一个letter
:
letter = 'A' | 'B' | 'C' | 'D' | 'E' | 'H' | 'I' | 'J' | 'K' | 'L' |
'O' | 'P' | 'Q' | 'R' | 'S' | 'V' | 'W' | 'X' | 'Y' | 'Z' | 'a' | 'b'
| 'c' | 'd' | 'e' | 'h' | 'i' | 'j' | 'k' | 'l' | 'o' | 'p' | 'q' |
'r' | 's' | 'v' | 'w' | 'x' | 'y' | 'z' 'F' | 'G' | 'M' | 'N' | 'T' |
'U' | 'f' | 'g' | 'm' | 'n' | 't' | 'u';
我更愿意写得更简单,letter = [a..z]|[A..Z];
我的问题是:以这种形式(使用正则表达式)定义 letter
会破坏 EBNF 属性 的自我定义吗?
为此使用特殊序列:
A special-sequence consists of a special-sequence-symbol
followed by a (possibly empty) sequence of special-
sequence-characters followed by a special-sequence-
symbol.
The sequence of symbols represented by a special-sequence
is outside the scope of this International Standard. Only the
format of a special-sequence is defined in this International
Standard. A special-sequence provides a notation for
extensions which a user may require.
W3C 广泛使用它。例如:
The formal grammar of XML is given in this specification using a simple Extended Backus-Naur Form (EBNF) notation. Each rule in the grammar defines one symbol, in the form
symbol ::= expression
Symbols are written with an initial capital letter if they are the start symbol of a regular language, otherwise with an initial lowercase letter. Literal strings are quoted.
Within the expression on the right-hand side of a rule, the following expressions are used to match strings of one or more characters:
#xN
where N is a hexadecimal integer, the expression matches the character whose number (code point) in ISO/IEC 10646 is N. The number of leading zeros in the #xN form is insignificant.
[a-zA-Z], [#xN-#xN]
matches any Char with a value in the range(s) indicated (inclusive).
[abc], [#xN#xN#xN]
matches any Char with a value among the characters enumerated. Enumerations and ranges can be mixed in one set of brackets.
[^a-z], [^#xN-#xN]
matches any Char with a value outside the range indicated.
[^abc], [^#xN#xN#xN]
matches any Char with a value not among the characters given. Enumerations and ranges of forbidden values can be mixed in one set of brackets.
"string"
matches a literal string matching that given inside the double quotes.
'string'
matches a literal string matching that given inside the single quotes.
These symbols may be combined to match more complex patterns as follows, where A and B represent simple expressions:
(expression)
expression is treated as a unit and may be combined as described in this list.
A?
matches A or nothing; optional A.
A B
matches A followed by B. This operator has higher precedence than alternation; thus A B | C D is identical to (A B) | (C D).
A | B
matches A or B.
A - B
matches any string that matches A but does not match B.
A+
matches one or more occurrences of A. Concatenation has higher precedence than alternation; thus A+ | B+ is identical to (A+) | (B+).
A*
matches zero or more occurrences of A. Concatenation has higher precedence than alternation; thus A* | B* is identical to (A*) | (B*).
Other notations used in the productions are:
/* ... */
comment.
[ wfc: ... ]
well-formedness constraint; this identifies by name a constraint on well-formed documents associated with a production.
[ vc: ... ]
validity constraint; this identifies by name a constraint on valid documents associated with a production.
参考资料
我正在使用标准化版本 (ISO/IEC 14997 : 1996(E)) EBNF 来定义我的语法。 标准化版本是一种元元语言(它可以自我解析)。
我这样定义一个letter
:
letter = 'A' | 'B' | 'C' | 'D' | 'E' | 'H' | 'I' | 'J' | 'K' | 'L' |
'O' | 'P' | 'Q' | 'R' | 'S' | 'V' | 'W' | 'X' | 'Y' | 'Z' | 'a' | 'b'
| 'c' | 'd' | 'e' | 'h' | 'i' | 'j' | 'k' | 'l' | 'o' | 'p' | 'q' |
'r' | 's' | 'v' | 'w' | 'x' | 'y' | 'z' 'F' | 'G' | 'M' | 'N' | 'T' |
'U' | 'f' | 'g' | 'm' | 'n' | 't' | 'u';
我更愿意写得更简单,letter = [a..z]|[A..Z];
我的问题是:以这种形式(使用正则表达式)定义 letter
会破坏 EBNF 属性 的自我定义吗?
为此使用特殊序列:
A special-sequence consists of a special-sequence-symbol followed by a (possibly empty) sequence of special- sequence-characters followed by a special-sequence- symbol.
The sequence of symbols represented by a special-sequence is outside the scope of this International Standard. Only the format of a special-sequence is defined in this International Standard. A special-sequence provides a notation for extensions which a user may require.
W3C 广泛使用它。例如:
The formal grammar of XML is given in this specification using a simple Extended Backus-Naur Form (EBNF) notation. Each rule in the grammar defines one symbol, in the form symbol ::= expression Symbols are written with an initial capital letter if they are the start symbol of a regular language, otherwise with an initial lowercase letter. Literal strings are quoted. Within the expression on the right-hand side of a rule, the following expressions are used to match strings of one or more characters: #xN where N is a hexadecimal integer, the expression matches the character whose number (code point) in ISO/IEC 10646 is N. The number of leading zeros in the #xN form is insignificant. [a-zA-Z], [#xN-#xN] matches any Char with a value in the range(s) indicated (inclusive). [abc], [#xN#xN#xN] matches any Char with a value among the characters enumerated. Enumerations and ranges can be mixed in one set of brackets. [^a-z], [^#xN-#xN] matches any Char with a value outside the range indicated. [^abc], [^#xN#xN#xN] matches any Char with a value not among the characters given. Enumerations and ranges of forbidden values can be mixed in one set of brackets. "string" matches a literal string matching that given inside the double quotes. 'string' matches a literal string matching that given inside the single quotes. These symbols may be combined to match more complex patterns as follows, where A and B represent simple expressions: (expression) expression is treated as a unit and may be combined as described in this list. A? matches A or nothing; optional A. A B matches A followed by B. This operator has higher precedence than alternation; thus A B | C D is identical to (A B) | (C D). A | B matches A or B. A - B matches any string that matches A but does not match B. A+ matches one or more occurrences of A. Concatenation has higher precedence than alternation; thus A+ | B+ is identical to (A+) | (B+). A* matches zero or more occurrences of A. Concatenation has higher precedence than alternation; thus A* | B* is identical to (A*) | (B*). Other notations used in the productions are: /* ... */ comment. [ wfc: ... ] well-formedness constraint; this identifies by name a constraint on well-formed documents associated with a production. [ vc: ... ] validity constraint; this identifies by name a constraint on valid documents associated with a production.
参考资料