C 预处理器：续行：为什么在反斜杠字符（'\'）之后不允许完全注释？

Question

有效代码：

#define M xxx\
yyy

无效代码：

#define M xxx\/*comment*/
yyy

#define M xxx\//comment
yyy

问题：

为什么反斜杠字符 (\) 后不允许注释？
标准是怎么说的？

更新。附加题：

要求（为了实现物理源行拼接）反斜杠字符（\）必须紧跟一个换行符背后的动机/原因/论证是什么？在反斜杠字符（\）之后允许注释（或空格）的障碍是什么？

Answer 1

仅当反斜杠字符是一行中的最后一个字符时，才会将行拼接在一起。 C 2018 5.1.1.2 指定了翻译 C 程序的阶段。在第 2 阶段：

Each instance of a backslash character () immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines…

如果注释后面是反斜杠字符，反斜杠字符后面不会跟new-line字符，所以不进行拼接。评论在第 3 阶段处理：

The source file is decomposed into preprocessing tokens7) and sequences of white-space characters (including comments)… Each comment is replaced by one space character…

关于添加的问题：

What is the motivation / reason / argumentation behind the requirement that (in order to achieve splicing of physical source lines) backslash character (\) must immediately follow by a new-line character? What is the obstacle to allow comments (or spaces) after the backslash character (\)?

编译C程序最早的处理是最简单的。早期的 C 编译器可能已经实现为简单过滤器层：首先 local-environment 字符或文件存储方法将被转换为简单的字符流，然后行将被拼接在一起（可能处理想要一个长源代码行，同时必须在 80 列穿孔卡片上键入源代码），然后注释将被删除，依此类推。

将行尾用反斜杠标记的行拼接在一起很容易；它只需要看两个字符。相反，如果我们允许注释跟在标记拼接的反斜杠之后，它就会变得复杂：

反斜杠后跟注释后跟new-line会被拼接，反斜杠后跟注释后跟其他源代码则不会。这需要提前查看可能的许多字符并解析注释定界符，可能用于多个注释。
拼接线的一个目的是允许跨多行连续的长字符串。（这是在 C 中连接相邻字符串之前。）因此，一行中的 "abc\ 和另一行中的 def" 将拼接在一起，从而形成 "abcdef"。虽然我们可能允许在旨在连接行的反斜杠之后进行注释，但我们不想在包含 "abc\ /*" /*comment*/ 的行之后进行拼接。这意味着进行拼接的代码必须是 context-sensitive;如果反斜杠出现在带引号的字符串中，则必须区别对待。

Answer 2

根据 5.1.1.2 Translation phases of the C11 standard（注意添加的粗体文本）

5.1.1.2 Translation phases

1 The precedence among the syntax rules of translation is specified by the following phases.6)

1 Physical source file multibyte characters are mapped, in an implementation- defined manner, to the source character set (introducing new-line characters for end-of-line indicators) if necessary. Trigraph sequences are replaced by corresponding single-character internal representations.

2 Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice. A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character before any such splicing takes place.

...

只有个反斜杠字符紧跟 new-line 会导致行被拼接。评论不是 new-line 字符。

Answer 3

实际上是在删除评论之前处理 backslash-newlines 的原因。这与为什么 backslash-newlines 被完全删除，而不是像注释那样被（虚拟）水平空白替换。这是一个荒谬的理由，但这是官方的理由。这样你就可以通过在第 79 列插入 backslash-newline 来机械地 force-fit 带有长行的 C 代码，无论除法是什么：

static int cp_old_stat(struct kstat *stat, struct __old_kernel_stat __user * st\
atbuf)
{
        static int warncount = 5;
        struct __old_kernel_stat tmp;

        if (warncount > 0) {
                warncount--;
                printk(KERN_WARNING "VFS: Warning: %s using old stat() call. Re\
compile your binary.\n",

（这是我在我的硬盘驱动器上发现的第一个 C 块，它实际上有不适合打孔卡的行）

为了按预期工作，backslash-newline 必须能够拆分 /* 或 */，例如

/* this comment just so happens to be exactly 80 characters wide at the close *\
/

你不能同时拥有这两种方式：如果在处理 backslash-newline 之前删除评论，那么 backslash-newline 不会影响评论边界；反之，如果先处理backslash-newline，则反斜杠和换行之间不能出现注释。

（我不是瞎编的™：C99 Rationale 第 5.1.1.2 节第 30 段读作

A backslash immediately before a newline has long been used to continue string literals, as well as preprocessing command lines. In the interest of easing machine generation of C, and of transporting code to machines with restrictive physical line lengths, the C89 Committee generalized this mechanism to permit any token to be continued by interposing a backslash/newline sequence.

强调原文。抱歉，我不知道此文档的任何 non-PDF 版本。）

C 预处理器：续行：为什么在反斜杠字符（'\'）之后不允许完全注释？

C preprocessor: line continuation: why exactly comment is not allowed after backslash character ('\')?

c

comments

backslash

c-preprocessor