无用的反斜杠会产生明确定义的字符串常量吗?
Do useless backslashs produce well-defined string constants?
C 和 C++ 都支持一组看似等效的转义序列,例如 \b
、\t
、\n
、\"
以及其他以反斜杠字符开头的序列(\
)。如果后面是正常字符,如何处理反斜杠?据我所记得的几个编译器,转义字符 \
被默默地跳过了。在 cppreference.com 上,我阅读了这些文章
我只找到了这篇关于孤立反斜杠的注释(在 C 文章中)
ISO C requires a diagnostic if the backslash is followed by any character not listed here: [...]
参考上方 table。我还看了一些在线编译器
C demo
#include <stdio.h>
int main(void) {
// your code goes here
printf("%d", !strcmp("\ x", "\ x"));
printf("%d", !strcmp("\ x", "\\ x"));
printf("%d", !strcmp("\ x", "\\ x"));
return 0;
}
C++ demo
#include <iostream>
#include <string>
using namespace std;
int main() {
cout << (string("\ x") == "\ x");
cout << (string("\ x") == "\\ x");
cout << (string("\ x") == "\\ x");
return 0;
}
两者都将 "\ x"
和 "\\ x"
视为等效的,通过语法高亮显示(某种)警告。 IOW "\\ x"
已转换为 "\ x"
.
我可以假设这是已定义的行为吗?
澄清(编辑)
- 我 不询问 明显无效的字符串文字,例如
"\"
。
- 我知道孤立的反斜杠有点问题。
- 我想知道编译器构建的常量结果是否已定义。
编辑 #2:更加关注常量的生成(和可移植性)。
您需要使用符合标准的 C 编译器进行编译。各种在线编译器倾向于使用 gcc,默认设置为 "lax non-standard mode",又名 GNU C。这可能会或可能不会启用一些非标准转义序列,但它也不会产生编译器 错误 即使您违反了 C 语言 - 您可能会逃避 "warning",但这并不能使代码有效 C.
如果您使用 -std=c17 -pedantic-errors
告诉 gcc 作为符合标准的 C 编译器,您会收到此错误:
error: unknown escape sequence: '0'
040 是 32 的八进制,它是 ' '
的 ASCII 码。 (由于某些原因,gcc 在内部对转义序列使用八进制表示法,可能是因为 \0 是八进制,我不知道为什么。)
答案是否定的。这是一个无效的 C 程序和 未指定的行为 C++ 程序。
C 标准
说它在语法上是错误的(强调是我的),它没有产生有效的令牌,因此该程序无效:
5.2.1 Character sets
2/ In a character constant or string literal, members of the execution character set shall be represented by corresponding members of the source character set or by escape sequences consisting of the backslash \ followed by one or more characters.
6.4.4.4 Character constants
3/
The single-quote ', the double-quote ", the question-mark ?, the backslash \, and arbitrary integer values are representable according to the following table of escape sequences:
- single quote '
\'
- double quote "
\"
- question mark ?
\?
- backslash \
\
- octal character
\octal digits
- hexadecimal character
\xhexadecimal digits
8/ In addition, characters not in the basic character set are representable by universal
character names and certain nongraphic characters are representable by escape sequences
consisting of the backslash \ followed by a lowercase letter: \a, \b, \f, \n, \r, \t, and \v. Note : If any other character follows a backslash, the result is not a token and a diagnostic is required.
C++ 标准
不同的说法(强调是我的):
5.13.3 Character literals
7/ Certain non-graphic characters, the single quote ’, the double quote ", the question mark ?,25 and the backslash \, can be represented according to Table 8. The double quote " and the question mark ?, can be represented as themselves or by the escape sequences \" and \? respectively, but the single quote ’ and the backslash \ shall be represented by the escape sequences \’ and \ respectively. Escape sequences in which the character following the backslash is not listed in Table 8 are conditionally-supported, with implementation-defined semantics. An escape sequence specifies a single character.
因此对于 C++,您需要查看编译器手册以了解语义,但该程序在语法上是有效的。
C 和 C++ 都支持一组看似等效的转义序列,例如 \b
、\t
、\n
、\"
以及其他以反斜杠字符开头的序列(\
)。如果后面是正常字符,如何处理反斜杠?据我所记得的几个编译器,转义字符 \
被默默地跳过了。在 cppreference.com 上,我阅读了这些文章
我只找到了这篇关于孤立反斜杠的注释(在 C 文章中)
ISO C requires a diagnostic if the backslash is followed by any character not listed here: [...]
参考上方 table。我还看了一些在线编译器
C demo
#include <stdio.h>
int main(void) {
// your code goes here
printf("%d", !strcmp("\ x", "\ x"));
printf("%d", !strcmp("\ x", "\\ x"));
printf("%d", !strcmp("\ x", "\\ x"));
return 0;
}
C++ demo
#include <iostream>
#include <string>
using namespace std;
int main() {
cout << (string("\ x") == "\ x");
cout << (string("\ x") == "\\ x");
cout << (string("\ x") == "\\ x");
return 0;
}
两者都将 "\ x"
和 "\\ x"
视为等效的,通过语法高亮显示(某种)警告。 IOW "\\ x"
已转换为 "\ x"
.
我可以假设这是已定义的行为吗?
澄清(编辑)
- 我 不询问 明显无效的字符串文字,例如
"\"
。 - 我知道孤立的反斜杠有点问题。
- 我想知道编译器构建的常量结果是否已定义。
编辑 #2:更加关注常量的生成(和可移植性)。
您需要使用符合标准的 C 编译器进行编译。各种在线编译器倾向于使用 gcc,默认设置为 "lax non-standard mode",又名 GNU C。这可能会或可能不会启用一些非标准转义序列,但它也不会产生编译器 错误 即使您违反了 C 语言 - 您可能会逃避 "warning",但这并不能使代码有效 C.
如果您使用 -std=c17 -pedantic-errors
告诉 gcc 作为符合标准的 C 编译器,您会收到此错误:
error: unknown escape sequence: '0'
040 是 32 的八进制,它是 ' '
的 ASCII 码。 (由于某些原因,gcc 在内部对转义序列使用八进制表示法,可能是因为 \0 是八进制,我不知道为什么。)
答案是否定的。这是一个无效的 C 程序和 未指定的行为 C++ 程序。
C 标准
说它在语法上是错误的(强调是我的),它没有产生有效的令牌,因此该程序无效:
5.2.1 Character sets
2/ In a character constant or string literal, members of the execution character set shall be represented by corresponding members of the source character set or by escape sequences consisting of the backslash \ followed by one or more characters.
6.4.4.4 Character constants
3/ The single-quote ', the double-quote ", the question-mark ?, the backslash \, and arbitrary integer values are representable according to the following table of escape sequences:
- single quote '
\'
- double quote "
\"
- question mark ?
\?
- backslash \
\
- octal character
\octal digits
- hexadecimal character
\xhexadecimal digits
8/ In addition, characters not in the basic character set are representable by universal character names and certain nongraphic characters are representable by escape sequences consisting of the backslash \ followed by a lowercase letter: \a, \b, \f, \n, \r, \t, and \v. Note : If any other character follows a backslash, the result is not a token and a diagnostic is required.
C++ 标准
不同的说法(强调是我的):
5.13.3 Character literals
7/ Certain non-graphic characters, the single quote ’, the double quote ", the question mark ?,25 and the backslash \, can be represented according to Table 8. The double quote " and the question mark ?, can be represented as themselves or by the escape sequences \" and \? respectively, but the single quote ’ and the backslash \ shall be represented by the escape sequences \’ and \ respectively. Escape sequences in which the character following the backslash is not listed in Table 8 are conditionally-supported, with implementation-defined semantics. An escape sequence specifies a single character.
因此对于 C++,您需要查看编译器手册以了解语义,但该程序在语法上是有效的。