C 词法分析器：未终止字符串文字的标记串联

Question

考虑以下 C 代码：

#include <stdio.h>

#define PRE(a) " ## a

int main() {
    printf("%s\n", PRE("));
    return 0;
}

如果我们严格遵守 c99 的标记化规则，我希望它分解为：

...
[#] [define] [PRE] [(] [a] [)] ["]* [##] [a]
...
[printf] [(] ["%s\n"] [,] [PRE] [(] ["]* [)] [)] [;]
...

* 不匹配任何 preprocessing-token 模式的单个非空白字符

因此，在运行预处理指令之后，printf 行应变为：

printf("%s\n", "");

并正常解析。但是，它会在使用 gcc 编译时抛出错误，即使在使用标志 -std=c99 -pedantic 时也是如此。我错过了什么？

Answer 1

来自C11 6.4. Lexical elements：

      preprocessing-token:
             header-name
             identifier
             pp-number
             character-constant
             string-literal
             punctuator
             each non-white-space character that cannot be one of the above
3 [...] The categories of preprocessing tokens are: header names, identifiers, preprocessing numbers, character constants, string literals, punctuators, and single non-white-space characters that do not lexically match the other preprocessing token categories.69) If a ' or a " character matches the last category, the behavior is undefined. [...]

因此，如果 " 不是字符串文字的一部分，而是非白色 space 字符，则行为未定义。我不知道为什么它是未定义的而不是硬错误 - 我认为这是为了允许编译器解析多行字符串文字。

it throws an error

But on godbolt:

<source>:3:16: warning: missing terminating " character
    3 | #define PRE(a) " ## a
      |                ^
<source>:6:24: warning: missing terminating " character
    6 |     printf("%s\n", PRE("));
      |                        ^
<source>:8: error: unterminated argument list invoking macro "PRE"
    8 | }
      | 
<source>: In function 'int main()':
<source>:6:20: error: 'PRE' was not declared in this scope
    6 |     printf("%s\n", PRE("));
      |                    ^~~
<source>:6:20: error: expected '}' at end of input
<source>:5:12: note: to match this '{'
    5 | int main() {
      |            ^

它在 #define PRE 行（可能）抛出错误 not，但在 PRE(") 行。标记在宏替换 (phase 3 vs phase 4) 之前被识别，因此无论您做什么，您都不会喜欢“创建”新的词法字符串文字作为宏替换的结果，例如通过粘合两个宏或像您想做的那样。请注意，-pedantic 不会将警告更改为错误 - -pedantic 在标准告诉抛出错误的地方抛出错误，但标准告诉行为未定义，因此那里不需要错误。

C 词法分析器：未终止字符串文字的标记串联

C lexer: token concatenation of unterminated string literals

c

compilation

c-preprocessor