无用的反斜杠会产生明确定义的字符串常量吗?

Do useless backslashs produce well-defined string constants?

C 和 C++ 都支持一组看似等效的转义序列,例如 \b\t\n\" 以及其他以反斜杠字符开头的序列(\)。如果后面是正常字符,如何处理反斜杠?据我所记得的几个编译器,转义字符 \ 被默默地跳过了。在 cppreference.com 上,我阅读了这些文章

我只找到了这篇关于孤立反斜杠的注释(在 C 文章中)

ISO C requires a diagnostic if the backslash is followed by any character not listed here: [...]

参考上方 table。我还看了一些在线编译器

C demo

#include <stdio.h>

int main(void) {
    // your code goes here
    printf("%d", !strcmp("\ x", "\ x"));
    printf("%d", !strcmp("\ x", "\\ x"));
    printf("%d", !strcmp("\ x", "\\ x"));
    return 0;
}

C++ demo

#include <iostream>
#include <string>
using namespace std;

int main() {
    cout << (string("\ x") == "\ x");
    cout << (string("\ x") == "\\ x");
    cout << (string("\ x") == "\\ x");
    return 0;
}

两者都将 "\ x""\\ x" 视为等效的,通过语法高亮显示(某种)警告。 IOW "\\ x" 已转换为 "\ x".

我可以假设这是已定义的行为吗?

澄清(编辑)


编辑 #2:更加关注常量的生成(和可移植性)。

您需要使用符合标准的 C 编译器进行编译。各种在线编译器倾向于使用 gcc,默认设置为 "lax non-standard mode",又名 GNU C。这可能会或可能不会启用一些非标准转义序列,但它也不会产生编译器 错误 即使您违反了 C 语言 - 您可能会逃避 "warning",但这并不能使代码有效 C.

如果您使用 -std=c17 -pedantic-errors 告诉 gcc 作为符合标准的 C 编译器,您会收到此错误:

error: unknown escape sequence: '0'

040 是 32 的八进制,它是 ' ' 的 ASCII 码。 (由于某些原因,gcc 在内部对转义序列使用八进制表示法,可能是因为 \0 是八进制,我不知道为什么。)

答案是否定的。这是一个无效的 C 程序和 未指定的行为 C++ 程序。

C 标准

说它在语法上是错误的(强调是我的),它没有产生有效的令牌,因此该程序无效:

5.2.1 Character sets

2/ In a character constant or string literal, members of the execution character set shall be represented by corresponding members of the source character set or by escape sequences consisting of the backslash \ followed by one or more characters.

6.4.4.4 Character constants

3/ The single-quote ', the double-quote ", the question-mark ?, the backslash \, and arbitrary integer values are representable according to the following table of escape sequences:

  • single quote ' \'
  • double quote " \"
  • question mark ? \?
  • backslash \ \
  • octal character \octal digits
  • hexadecimal character \xhexadecimal digits

8/ In addition, characters not in the basic character set are representable by universal character names and certain nongraphic characters are representable by escape sequences consisting of the backslash \ followed by a lowercase letter: \a, \b, \f, \n, \r, \t, and \v. Note : If any other character follows a backslash, the result is not a token and a diagnostic is required.

C++ 标准

不同的说法(强调是我的):

5.13.3 Character literals

7/ Certain non-graphic characters, the single quote ’, the double quote ", the question mark ?,25 and the backslash \, can be represented according to Table 8. The double quote " and the question mark ?, can be represented as themselves or by the escape sequences \" and \? respectively, but the single quote ’ and the backslash \ shall be represented by the escape sequences \’ and \ respectively. Escape sequences in which the character following the backslash is not listed in Table 8 are conditionally-supported, with implementation-defined semantics. An escape sequence specifies a single character.

因此对于 C++,您需要查看编译器手册以了解语义,但该程序在语法上是有效的。