根据 C++ 标准语法解析数字文字的不一致
Inconsistency parsing numeric literals according to C++ Standard's grammar
通读 C++17 标准,在我看来预处理器处理的 pp-number
与数字文字之间存在不一致,例如user-defined-integer-literal
,因为它们被定义为由 "upper" 语言处理。
例如,根据预处理器语法,以下内容被正确解析为 pp-number
:
123_e+1
但是放在符合 C++11 的代码片段的上下文中,
int operator"" _e(unsigned long long)
{ return 0; }
int test()
{
return 123_e+1;
}
当前的 Clang 或 GCC 编译器(我还没有测试过其他的)会 return 类似这样的错误:
unable to find numeric literal operator 'operator""_e+1'
未找到 operator"" _e(...)
并且尝试定义 operator"" _e+1(...)
将无效。
这似乎是因为编译器首先将标记词法化为 pp-number
,但随后无法回滚并应用语法规则user-defined-integer-literal
解析最终表达式时。
相比之下,以下代码编译得很好:
int operator"" _d(unsigned long long)
{ return 0; }
int test()
{
return 0x123_d+1; // doesn't lex as a 'pp-number' because 'sign' can only follow [eEpP]
}
这是对标准的正确解读吗?如果是这样,编译器应该处理这种可以说是罕见的极端情况是否合理?
您已成为 maximal munch rule 的受害者,词法分析器采用尽可能多的字符来形成有效标记。
这在第 [lex.pptoken]p3 节中有所介绍,其中说(强调我的):
Otherwise, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token, even if that would cause further lexical analysis to fail, except that a header-name ([lex.header]) is only formed within a #include directive.
并包括几个示例:
[ Example:
#define R "x"
const char* s = R"y"; // ill-formed raw string, not "x" "y"
— end example ]
4 [ Example: The program fragment 0xe+foo is parsed
as a preprocessing number token (one that is not a valid floating or
integer literal token), even though a parse as three preprocessing
tokens 0xe, +, and foo might produce a valid expression (for example,
if foo were a macro defined as 1). Similarly, the program fragment 1E1
is parsed as a preprocessing number (one that is a valid floating
literal token), whether or not E is a macro name. — end example ]
5[ Example: The program fragment x+++++y is parsed as x ++ ++ + y,
which, if x and y have integral types, violates a constraint on
increment operators, even though the parse x ++ + ++ y might yield a
correct expression. — end example ]
此规则在其他几个众所周知的案例中有效,例如 a+++++b and 。
参考pp-token语法如下:
pp-number:
digit
. digit
pp-number digit
pp-number identifier-nondigit
pp-number ' digit
pp-number ' nondigit
pp-number e sign
pp-number E sign
pp-number p sign
pp-number P sign
pp-number .
请注意 e sign
的产生,这就是阻碍此案例的原因。另一方面,如果你像第二个例子那样使用 d
,你就不会点击这个 (see it live on godbolt).
另外添加间距也可以解决您的问题,因为您将不再受制于最大咀嚼 (see it live on godbolt):
123_e + 1
通读 C++17 标准,在我看来预处理器处理的 pp-number
与数字文字之间存在不一致,例如user-defined-integer-literal
,因为它们被定义为由 "upper" 语言处理。
例如,根据预处理器语法,以下内容被正确解析为 pp-number
:
123_e+1
但是放在符合 C++11 的代码片段的上下文中,
int operator"" _e(unsigned long long)
{ return 0; }
int test()
{
return 123_e+1;
}
当前的 Clang 或 GCC 编译器(我还没有测试过其他的)会 return 类似这样的错误:
unable to find numeric literal operator 'operator""_e+1'
未找到 operator"" _e(...)
并且尝试定义 operator"" _e+1(...)
将无效。
这似乎是因为编译器首先将标记词法化为 pp-number
,但随后无法回滚并应用语法规则user-defined-integer-literal
解析最终表达式时。
相比之下,以下代码编译得很好:
int operator"" _d(unsigned long long)
{ return 0; }
int test()
{
return 0x123_d+1; // doesn't lex as a 'pp-number' because 'sign' can only follow [eEpP]
}
这是对标准的正确解读吗?如果是这样,编译器应该处理这种可以说是罕见的极端情况是否合理?
您已成为 maximal munch rule 的受害者,词法分析器采用尽可能多的字符来形成有效标记。
这在第 [lex.pptoken]p3 节中有所介绍,其中说(强调我的):
Otherwise, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token, even if that would cause further lexical analysis to fail, except that a header-name ([lex.header]) is only formed within a #include directive.
并包括几个示例:
[ Example:
#define R "x" const char* s = R"y"; // ill-formed raw string, not "x" "y"
— end example ]
4 [ Example: The program fragment 0xe+foo is parsed as a preprocessing number token (one that is not a valid floating or integer literal token), even though a parse as three preprocessing tokens 0xe, +, and foo might produce a valid expression (for example, if foo were a macro defined as 1). Similarly, the program fragment 1E1 is parsed as a preprocessing number (one that is a valid floating literal token), whether or not E is a macro name. — end example ]
5[ Example: The program fragment x+++++y is parsed as x ++ ++ + y, which, if x and y have integral types, violates a constraint on increment operators, even though the parse x ++ + ++ y might yield a correct expression. — end example ]
此规则在其他几个众所周知的案例中有效,例如 a+++++b and
参考pp-token语法如下:
pp-number: digit . digit pp-number digit pp-number identifier-nondigit pp-number ' digit pp-number ' nondigit pp-number e sign pp-number E sign pp-number p sign pp-number P sign pp-number .
请注意 e sign
的产生,这就是阻碍此案例的原因。另一方面,如果你像第二个例子那样使用 d
,你就不会点击这个 (see it live on godbolt).
另外添加间距也可以解决您的问题,因为您将不再受制于最大咀嚼 (see it live on godbolt):
123_e + 1