C++：多行字符串常量中的行尾是否有标准定义？

Question

如果我有一个多行字符串C++11字符串常量比如

R"""line 1
line 2
line3"""

是否定义了terminator/separator行由哪些字符组成？

Answer 1

^{注意：自发布答案以来，问题已发生重大变化。它只剩下一半，即纯 C++ 方面。此答案中的网络焦点解决了原始问题“将多行字符串发送到具有明确定义的行尾要求的服务器”。我一般不追问进化。}

在程序内部，换行符的 C++ 标准是 \n。这也用于原始文字中的换行符。原始文字没有特殊约定。

通常 \n 映射到 ASCII 换行符，即值 10。

我不确定它在 EBCDIC 中映射到什么，但如果需要，您可以检查一下。

然而，我的印象是大多数协议使用 ASCII 回车符 return 加换行符，即 13 后跟 10。这有时称为 CRLF，在 ASCII 缩写 CR 表示回车 return 和 LF 表示换行之后。当 C++ 转义符映射到 ASCII 时，这在 C++ 中只是 \r\n。

您需要遵守所使用协议的要求。

对于普通的 file/stream i/o，C++ 标准库负责将内部 \n 映射到主机环境使用的任何约定。这称为文本模式，与不执行映射的二进制模式相反。

对于标准库未涵盖的网络 i/o，应用程序代码必须自己完成此操作，直接或通过某些库函数。

有一个活跃问题，core language defect report #1655“原始字符串文字中的行结尾”，由 Mike Miller 于 2013-04-26 提交，他在其中询问,

” is it intended that, for example, a CRLF in the source of a raw string literal is to be represented as a newline character or as the original characters?

由于行结束值因原始文件的编码而异，并且考虑到在某些文件系统中不是行结束编码，而是行作为记录，很明显，其目的不是按原样表示文件内容——因为在所有情况下都不可能做到这一点。但是据我所知这个DR还没有解决。

Answer 2

标准似乎表明：

R"""line 1
line 2
line3"""

相当于：

"line 1\nline 2\nline3"

来自 C++11 标准的 2.14.5 字符串文字：

4 [ Note: A source-file new-line in a raw string literal results in a new-line in the resulting execution string literal. Assuming no whitespace at the beginning of lines in the following example, the assert will succeed:
const char *p = R"(a\
b
c)";
assert(std::strcmp(p, "a\\nb\nc") == 0);
—end note ]

5 [ Example: The raw string
R"a(
)\
a"
)a"
is equivalent to "\n)\\na\"\n".

Answer 3

目的是将原始字符串文字中的换行符映射到单个 '\n' 个字符。这个意图没有表达得那么清楚应该是，这导致了一些混乱。

引用是 2011 ISO C++ 标准。

首先，这是它映射到单个 '\n' 字符的证据。

第 2.14.5 节中的注释 [lex.string] 第 4 段说：

[ Note: A source-file new-line in a raw string literal results in a new-line in the resulting execution string-literal. Assuming no whitespace at the beginning of lines in the following example, the assert will succeed:

    const char *p = R"(a\
    b
    c)";
    assert(std::strcmp(p, "a\\nb\nc") == 0);

— end note ]

这清楚地表明换行符映射到单个 '\n' 特点。它还与 g++ 6.2.0 和 clang++ 3.8.1（在 Linux 系统上使用源文件进行的测试 Unix 风格和 Windows 风格的行结尾）。

鉴于笔记中明确说明的意图和两个人的行为流行的编译器，我会说依赖它是安全的——尽管它看看其他编译器如何实际处理这个问题会很有趣。

然而，规范措辞的字面解读标准很容易导致不同的结论，或者至少不确定。

第 2.5 节 [lex.pptoken] 第 3 段说（强调）：

Between the initial and final double quote characters of the raw string, any transformations performed in phases 1 and 2 (trigraphs, universal-character-names, and line splicing) are reverted; this reversion shall apply before any d-char, r-char, or delimiting parenthesis is identified.

翻译阶段在 2.2 [lex.phases] 中指定。在第 1 阶段：

Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary.

如果我们假设物理源文件字符映射到基本字符集和换行符的引入是 “tranformations”，我们可能会合理地得出结论，例如， Windows 格式的原始字符串文字中间的换行符源文件应该等同于 \r\n 序列。（我能想象这对 Windows 特定代码很有用。）

（这种解释确实会导致系统出现问题，其中行尾指示符不是字符序列，例如其中每一行都是一个固定宽度的记录。这样的系统很少见这些天。）

作为指出，有一个开放的 Defect Report 对于这个问题。 2013年提交的，现在还没有已解决。

就我个人而言，我认为混淆的根源在于 "any" （重点如前所述）：

Between the initial and final double quote characters of the raw string, any transformations performed in phases 1 and 2 (trigraphs, universal-character-names, and line splicing) are reverted; this reversion shall apply before any d-char, r-char, or delimiting parenthesis is identified.

肯定是物理源文件字符映射到可以合理地想到基本的源字符集作为转换。带括号的子句“（三字母，通用字符名称和行拼接）”似乎是有意的指定要还原哪个转换，但是要么试图改变单词 "transformations" 的含义（标准没有正式定义）或与使用相矛盾 "any".

这个词

我建议将单词 "any" 更改为 "certain" 将表达明显的意图更清楚：

Between the initial and final double quote characters of the raw string, certain transformations performed in phases 1 and 2 (trigraphs, universal-character-names, and line splicing) are reverted; this reversion shall apply before any d-char, r-char, or delimiting parenthesis is identified.

这种措辞会使“三字母，通用字符名称和行拼接”是唯一的要还原的转换。（没有全部完成在翻译阶段 1 和 2 被还原，只有那些特定的列出的转换。）

C++：多行字符串常量中的行尾是否有标准定义？

C++: Is there a standard definition for end-of-line in a multi-line string constant?

c++

portability

c++11