为什么选择二元运算符而不是一元运算符？

Question

对于同时作为一元和二元的运算符，为什么在像 a@b 这样的表达式中选择了二元？

经过大量的思考和搜索，我仍然无法回答为什么 a+b 被解析为二进制表达式而不是 a(+b)，这显然是乱码。

我不认为上下文无关语法能够区分这两者，并且试图在 this version of the standard 中找到答案也没有给我任何答案。

解析器是否专门选择二进制版本因为一元版本会是乱码？如果是这样，标准中是否有概述这一点的部分？

Answer 1

上下文无关并不意味着“无状态”。解析器有很多状态来跟踪给定它目前看到的标记可能的语法规则，并预测接下来会出现什么标记。因为没有规则说两个表达式可以直接相邻出现，所以它甚至不会考虑 a+b 可能是并排的表达式 a 和 +b。

例如，假设我们正在使用这个基本语法：

expr → expr '+' unary_expr | unary_expr
unary_expr → '+' unary_expr | IDENT

^{（符号：→ 给出非终结符可以扩展到的规则，| 表示替代可能性。'+' 是加号标记，IDENT 是任何标识符标记。)}

让我们解析 a+b。我们的解析器的起始状态将是：

1. expr → expr '+' unary_expr
         ^
2. expr → unary_expr
         ^
3. unary_expr → '+' unary_expr
               ^
4. unary_expr → IDENT
               ^

开始时它正在考虑一些规则。它不知道它会得到哪个，可能是其中任何一个。请注意，它正在考虑的每个产品还包括一个光标，我在上面用 ^ 插入符号标记了它。那是解析器在规则中的位置。

好的，现在它看到了第一个 IDENT 标记。它将其状态更新为以下内容：

1. expr → expr '+' unary_expr
              ^
2. expr → unary_expr
                    ^
3. unary_expr → IDENT
                     ^

现在它正在考虑三个规则。请注意，光标已向右移动。

如果第一个规则是正确的，那么它刚刚看到一个表达式并且期待下一个 '+'。或者，也许第二条规则是正确的，而 a 只是一个一元表达式。在那种情况下，它不希望有更多的令牌跟随。解析器不知道它会是哪个，所以它正在考虑两者。

你会看到如果下一个标记是 '+' 那么它必须是二进制加号。为什么？因为第一个规则是唯一一个预期下一个 '+' 标记的规则。

要将 '+' 解释为一元加上解析器必须使此规则处于活动状态，光标位于 '+':

之前

unary_expr → '+' unary_expr 
            ^

你可以看到它没有。

如果上下文无关并不意味着无状态，那么它是什么意思呢？我们从什么“环境”中“解放”出来了？

Context-free 是对语法可以包含哪些规则的限制。相反的是 context-sensitive，其中作品可以根据周围环境而变化。上下文相关语法比上下文无关语法更强大，但它们更难解析——即使对人类来说也是如此！语言理论家很早就发现，上下文无关文法占据了一个甜蜜点，即强大到足以表达而不是极其复杂的推理。

有关详细信息，请参阅： Context-free grammars versus context-sensitive grammars?

A context-free grammar (CFG) is a grammar where (as you noted) each production has the form A → w, where A is a nonterminal and w is a string of terminals and nonterminals. Informally, a CFG is a grammar where any nonterminal can be expanded out to any of its productions at any point. The language of a grammar is the set of strings of terminals that can be derived from the start symbol.

A context-sensitive grammar (CSG) is a grammar where each production has the form wAx → wyx, where w and x are strings of terminals and nonterminals and y is also a string of terminals. In other words, the productions give rules saying "if you see A in a given context, you may replace A by the string y." It's an unfortunate that these grammars are called "context-sensitive grammars" because it means that "context-free" and "context-sensitive" are not opposites, and it means that there are certain classes of grammars that arguably take a lot of contextual information into account but aren't formally considered to be context-sensitive.

为什么选择二元运算符而不是一元运算符？

Why is a binary operator selected over a unary operator?

c++

parsing

expression

operators

language-lawyer