字符串文字的编译

Compilation of string literals

为什么用 space、tab 或 "\n" 分隔的两个字符串字面量可以被编译而不会出错?

int main()
{
   char * a = "aaaa"  "bbbb";
} 

"aaaa" 是一个字符* "bbbb" 是一个字符*

没有特定的连接规则来处理两个字符串文字。显然下面的代码在编译过程中会出错:

#include <iostream>
int main()
{
   char * a = "aaaa";
   char * b = "bbbb";
   std::cout << a b;
}

这种串联是否对所有编译器都通用? "aaaa" 的空终止在哪里? "aaaabbbb" 是连续的 RAM 块吗?

如果你看到例如this translation phase reference 在第 6 阶段:

Adjacent string literals are concatenated.

这正是这里发生的事情。您有两个相邻的字符串文字,它们连接成一个字符串文字。

这是标准行为。

如您所见,它仅适用于字符串文字,不适用于两个指针变量。

String literals placed side-by-side are concatenated at translation phase 6 (after the preprocessor). That is, "Hello," " world!" yields the (single) string "Hello, world!". If the two strings have the same encoding prefix (or neither has one), the resulting string will have the same encoding prefix (or no prefix).

(source)

在此声明中

char * a = "aaaa"  "bbbb";

编译器在语法分析之前的某个编译步骤中将相邻的字符串文字视为一个文字。

所以对于编译器来说,上面的语句等同于

char * a = "aaaabbbb";

即编译器只存储一个字符串文字"aaaabbbb"

相邻的字符串文字按照 C(和 C++)标准的规则连接。但是相邻标识符(即变量 ab)不存在这样的规则。

引用,C++14(N3797 草案),§ 2.14.5:

In translation phase 6 (2.2), adjacent string literals are concatenated. If both string literals have the same encoding-prefix, the resulting concatenated string literal has that encoding-prefix. If one string literal has no encoding-prefix, it is treated as a string literal of the same encoding-prefix as the other operand. If a UTF-8 string literal token is adjacent to a wide string literal token, the program is ill-formed. Any other concatenations are conditionally-supported with implementation-defined behavior.

在 C 和 C++ 中,将相邻的字符串文字编译为单个字符串文字。例如:

"Some text..." "and more text"

相当于:

"Some text...and more text"

由于历史原因:

The original C language was designed in 1969-1972 when computing was still dominated by the 80 column punched card. Its designers used 80 column devices such as the ASR-33 Teletype. These devices did not automatically wrap text, so there was a real incentive to keep source code within 80 columns. Fortran and Cobol had explicit continuation mechanisms to do so, before they finally moved to free format.

It was a stroke of brilliance for Dennis Ritchie (I assume) to realise that there was no ambiguity in the grammar and that long ASCII strings could be made to fit into 80 columns by the simple expedient of getting the compiler to concatenate adjacent literal strings. Countless C programmers were grateful for that small feature.

Once the feature is in, why would it ever be removed? It causes no grief and is frequently handy. I for one wish more languages had it. The modern trend is to have extended strings with triple quotes or other symbols, but the simplicity of this feature in C has never been outdone.

Similar question here.