为什么 GCC 在使用三字母时发出警告,而在使用二字母时却不发出警告?
Why does GCC emit a warning when using trigraphs, but not when using digraphs?
代码:
#include <stdio.h>
int main(void)
{
??< puts("Hello Folks!"); ??>
}
上面的程序在使用 GCC 4.8.1 和 -Wall
和 -std=c11
编译时给出以下警告:
source_file.c: In function ‘main’:
source_file.c:8:5: warning: trigraph ??< converted to { [-Wtrigraphs]
??< puts("Hello Folks!"); ??>
^
source_file.c:8:30: warning: trigraph ??> converted to } [-Wtrigraphs]
但是当我将 main
的正文更改为:
<% puts("Hello Folks!"); %>
没有抛出警告。
那么,为什么编译器在使用三字母时警告我,但在使用二字母时却不警告我?
这个gcc document on pre-processing给出了一个很好的警告理由(强调我的):
Trigraphs are not popular and many compilers implement them incorrectly. Portable code should not rely on trigraphs being either converted or ignored. With -Wtrigraphs GCC will warn you when a trigraph may change the meaning of your program if it were converted.
并且在这个 gcc 文档中 on Tokenization 解释了与三字母不同的二字母没有潜在的负面影响(强调我的):
There are also six digraphs, which the C++ standard calls alternative tokens, which are merely alternate ways to spell other punctuators. This is a second attempt to work around missing punctuation in obsolete systems. It has no negative side effects, unlike trigraphs,
可能是因为它没有负面影响,不像 gcc 文档中所述的三字母:
Punctuators are all the usual bits of punctuation which are meaningful to C and C++. All but three of the punctuation characters in ASCII are C punctuators. The exceptions are ‘@’, ‘$’, and ‘`’. In addition, all the two- and three-character operators are punctuators. There are also six digraphs, which the C++ standard calls alternative tokens, which are merely alternate ways to spell other punctuators. This is a second attempt to work around missing punctuation in obsolete systems. It has no negative side effects, unlike trigraphs, but does not cover as much ground. The digraphs and their corresponding normal punctuators are:
Digraph: <% %> <: :> %: %:%:
Punctuator: { } [ ] # ##
因为三字母有悄悄更改代码的不良影响。这意味着同一个源文件在有和没有三字母替换的情况下都是有效的,但会导致 不同的 代码。这在字符串文字中尤其成问题,例如 "<em>What??</em>"
.
语言设计和语言演进应该尽量避免无声的变化。让编译器警告三字母是一件好事。
将此与二合字母进行对比,二合字母是 新标记,不会导致静默更改。
Trigraphs 很讨厌,因为它们使用的字符序列可以合法地出现在有效代码中。曾经导致经典 Macintosh 代码编译器错误的常见情况:
unsigned int signature = '????'; /* Should be value 0x3F3F3F3F */
Trigraph 处理会将其变成:
unsigned int signature = '??^; /* Should be value 0x3F3F3F3F */
这当然不会编译。在一些稍微罕见的情况下,这种处理可能会产生可以编译的代码,但与预期的含义不同,例如
char *template = "????/1234";
会变成
char *template = "??S4"; // ??/ becomes \, and 3 becomes S
不是预期的字符串文字,但仍然完全合法。
相比之下,二合字母相对良性,因为除了一些可能涉及宏的奇怪极端情况外,如果没有此类处理,包含可处理二合字母的代码将没有合法含义。
代码:
#include <stdio.h>
int main(void)
{
??< puts("Hello Folks!"); ??>
}
上面的程序在使用 GCC 4.8.1 和 -Wall
和 -std=c11
编译时给出以下警告:
source_file.c: In function ‘main’:
source_file.c:8:5: warning: trigraph ??< converted to { [-Wtrigraphs]
??< puts("Hello Folks!"); ??>
^
source_file.c:8:30: warning: trigraph ??> converted to } [-Wtrigraphs]
但是当我将 main
的正文更改为:
<% puts("Hello Folks!"); %>
没有抛出警告。
那么,为什么编译器在使用三字母时警告我,但在使用二字母时却不警告我?
这个gcc document on pre-processing给出了一个很好的警告理由(强调我的):
Trigraphs are not popular and many compilers implement them incorrectly. Portable code should not rely on trigraphs being either converted or ignored. With -Wtrigraphs GCC will warn you when a trigraph may change the meaning of your program if it were converted.
并且在这个 gcc 文档中 on Tokenization 解释了与三字母不同的二字母没有潜在的负面影响(强调我的):
There are also six digraphs, which the C++ standard calls alternative tokens, which are merely alternate ways to spell other punctuators. This is a second attempt to work around missing punctuation in obsolete systems. It has no negative side effects, unlike trigraphs,
可能是因为它没有负面影响,不像 gcc 文档中所述的三字母:
Punctuators are all the usual bits of punctuation which are meaningful to C and C++. All but three of the punctuation characters in ASCII are C punctuators. The exceptions are ‘@’, ‘$’, and ‘`’. In addition, all the two- and three-character operators are punctuators. There are also six digraphs, which the C++ standard calls alternative tokens, which are merely alternate ways to spell other punctuators. This is a second attempt to work around missing punctuation in obsolete systems. It has no negative side effects, unlike trigraphs, but does not cover as much ground. The digraphs and their corresponding normal punctuators are:
Digraph: <% %> <: :> %: %:%:
Punctuator: { } [ ] # ##
因为三字母有悄悄更改代码的不良影响。这意味着同一个源文件在有和没有三字母替换的情况下都是有效的,但会导致 不同的 代码。这在字符串文字中尤其成问题,例如 "<em>What??</em>"
.
语言设计和语言演进应该尽量避免无声的变化。让编译器警告三字母是一件好事。
将此与二合字母进行对比,二合字母是 新标记,不会导致静默更改。
Trigraphs 很讨厌,因为它们使用的字符序列可以合法地出现在有效代码中。曾经导致经典 Macintosh 代码编译器错误的常见情况:
unsigned int signature = '????'; /* Should be value 0x3F3F3F3F */
Trigraph 处理会将其变成:
unsigned int signature = '??^; /* Should be value 0x3F3F3F3F */
这当然不会编译。在一些稍微罕见的情况下,这种处理可能会产生可以编译的代码,但与预期的含义不同,例如
char *template = "????/1234";
会变成
char *template = "??S4"; // ??/ becomes \, and 3 becomes S
不是预期的字符串文字,但仍然完全合法。
相比之下,二合字母相对良性,因为除了一些可能涉及宏的奇怪极端情况外,如果没有此类处理,包含可处理二合字母的代码将没有合法含义。