C 预处理器在多大程度上考虑整数文字后缀?

To which degree does the C preprocessor regard integer literal suffixes?

今天,我偶然发现了这样的事情:

#define FOO 2u

#if (FOO == 2)
  unsigned int foo = FOO;
#endif

不管代码为何如此(让我们不要质疑 why),我想知道预处理器能在多大程度上处理整数文字后缀。我真的很惊讶它能起作用。 在用 GCC and C99 和这段代码做了一些实验后 ...

#include <stdio.h>

int main()
{
  #if (1u == 1)
    printf("1u == 1\n");
  #endif

  #if (1u + 1l == 2ll)
    printf("1u + 1l == 2ll\n");
  #endif

  #if (1ull - 2u == -1)
    printf("1ull - 2u == -1\n");
  #endif

  #if (1u - 2u == 0xFFFFFFFFFFFFFFFF)
    printf("1u - 2u == 0xFFFFFFFFFFFFFFFF\n");
  #endif

  #if (-1 == 0xFFFFFFFFFFFFFFFF)
    printf("-1 == 0xFFFFFFFFFFFFFFFF\n");
  #endif

  #if (-1l == 0xFFFFFFFFFFFFFFFF)
    printf("-1l == 0xFFFFFFFFFFFFFFFF\n");
  #endif

  #if (-1ll == 0xFFFFFFFFFFFFFFFF)
    printf("-1ll == 0xFFFFFFFFFFFFFFFF\n");
  #endif
}

... 仅打印所有语句:

1u == 1
1u + 1l == 2ll
1ull - 2u == -1
1u - 2u == 0xFFFFFFFFFFFFFFFF
-1 == 0xFFFFFFFFFFFFFFFF
-1l == 0xFFFFFFFFFFFFFFFF
-1ll == 0xFFFFFFFFFFFFFFFF

...我猜预处理器完全忽略了整数文字后缀,并且可能总是以本机整数大小(在本例中为 64 位)进行算术和比较?

所以,这是我想知道的东西:

  1. 预处理器在多大程度上考虑整数文字后缀?还是只是忽略它们?
  2. 在不同的环境下是否存在任何依赖关系或不同的行为,例如不同的编译器、C 与 C++、32 位与 64 位机器等?即,预处理器的行为取决于什么?
  3. specified/documented 在哪里?

我想自己去了解一下 Wikipedia and the C standard (working paper)。我找到了关于整数后缀的信息和关于预处理器的信息,但是 none 关于这些的组合。显然,我也用谷歌搜索过,但没有得到任何有用的结果。

我看到 阐明了应该 指定的位置,但是,我找不到问题的答案。

C 2018 6.10.1 处理条件包含(#if 和相关语句以及 defined 运算符)。第 1 段说:

The expression that controls conditional inclusion shall be an integer constant expression except that: identifiers (including those lexically identical to keywords) are interpreted as described below; and it may contain unary operator expressions of the form

defined identifier

or

defined ( identifier )

整数常量表达式定义在6.6 6:

An integer constant expression shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, _Alignof expressions, and floating constants that are the immediate operands of casts. Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof or _Alignof operator.

该段一般适用于 C,而不仅仅是预处理器。所以#if语句中可以出现的表达式和C中一般可以出现的整型常量表达式是一样的。但是,正如上面引述的那样,sizeof_Alignof只是身份标识;它们不被识别为 C 运算符。特别是,6.10.1 4 告诉我们:

… After all replacements due to macro expansion and the defined unary operator have been performed, all remaining identifiers (including those lexically identical to keywords) are replaced with the pp-number 0,…

因此,sizeof_Alignof 出现在 #if 表达式中的地方,它变成 0。因此,#if 表达式只能有常量操作数和 defined 表达式。

第4段接着说:

… The resulting tokens compose the controlling constant expression which is evaluated according to the rules of 6.6. For the purposes of this token conversion and evaluation, all signed integer types and all unsigned integer types act as if they have the same representation as, respectively, the types intmax_t and uintmax_t defined in the header <stdint.h>.…

6.6 是常量表达式部分。

因此,编译器将接受 #if 表达式中的整数后缀,并且这不依赖于 C 实现(对于核心 C 语言中所需的后缀;实现可以允许扩展)。但是,所有算术运算都将使用 intmax_tuintmax_t 执行,而这些确实取决于实现。如果您的表达式不依赖于高于最小要求 1 的整数宽度,则它们在任何 C 实现中都应该被相同地计算。

此外,第 4 段继续说字符常量和值可能会有一些变化,我在这里省略,因为它与这个问题无关。

脚注

1 intmax_t 指定能够表示任何有符号整数类型(7.20.1.5 1)的任何值的有符号类型,并且 long long int 是一个必须至少为 64 位 (5.2.4.2.1 1) 的有符号类型,因此任何符合标准的 C 实现都必须在预处理器中提供 64 位整数运算。

正如我在评论中指出的那样,这是在 C 标准中定义的。这是 §6.10.1 ¶4 的完整文本(和两个脚注):

C11 §6.10.1 Conditional inclusion

¶4 Prior to evaluation, macro invocations in the list of preprocessing tokens that will become the controlling constant expression are replaced (except for those macro names modified by the defined unary operator), just as in normal text. If the token defined is generated as a result of this replacement process or use of the defined unary operator does not match one of the two specified forms prior to macro replacement, the behavior is undefined. After all replacements due to macro expansion and the defined unary operator have been performed, all remaining identifiers (including those lexically identical to keywords) are replaced with the pp-number 0, and then each preprocessing token is converted into a token. The resulting tokens compose the controlling constant expression which is evaluated according to the rules of 6.6. For the purposes of this token conversion and evaluation, all signed integer types and all unsigned integer types act as if they have the same representation as, respectively, the types intmax_t and uintmax_t defined in the header <stdint.h>.167) This includes interpreting character constants, which may involve converting escape sequences into execution character set members. Whether the numeric value for these character constants matches the value obtained when an identical character constant occurs in an expression (other than within a #if or #elif directive) is implementation-defined.168) Also, whether a single-character character constant may have a negative value is implementation-defined.

167 167) Thus, on an implementation where INT_MAX is 0x7FFF and UINT_MAX is 0xFFFF, the constant 0x8000 is signed and positive within a #if expression even though it would be unsigned in translation phase 7.

168 Thus, the constant expression in the following #if directive and if statement is not guaranteed to evaluate to the same value in these two contexts.

#if 'z' - 'a' == 25
if ('z' - 'a' == 25)

第 6.6 节是§6.6 Constant expressions, which details the differences between the full expressions in section §6.5 Expressions和常量表达式。

实际上,预处理器在很大程度上忽略了后缀。十六进制常量是无符号的。您显示的结果在 intmax_tuintmax_t 是 64 位数量的机器上是预期的。如果 intmax_tuintmax_t 的限制更大,一些表达式可能会改变。

  1. To which degree does the preprocessor regard integer literal suffixes? Or does it just ignore them?

整型常量的类型后缀对预处理器本身没有意义,但它们是相应预处理标记的固有部分,而不是分开的。该标准是这样说的:

A preprocessing number begins with a digit optionally preceded by a period (.) and may be followed by valid identifier characters and the character sequences e+, e-, E+, E-, p+, p-, P+, or P-.

Preprocessing number tokens lexically include all floating and integer constant tokens.

(C11 6.4.8/2-3;已强调)

在大多数情况下,预处理器对这种类型的预处理令牌的处理方式与其他任何方式都没有区别。例外是在 #if 指令的控制表达式中,这些指令通过执行宏扩展、用 0 替换标识符,然后 将每个预处理标记转换为标记 来评估,然后再评估根据 C 规则计算结果。转换为令牌说明类型后缀,产生 真正的 整数常量。

但是,这不一定会产生与您从相同表达式的运行时求值得到的结果相同的结果,因为

For the purposes of this token conversion and evaluation, all signed integer types and all unsigned integer types act as if they have the same representation as, respectively, the types intmax_t and uintmax_t.

(C2011, 6.10.1/4)

你接着问

  1. Are there any dependencies or different behaviors with different environments, e.g. different compilers, C vs. C++, 32 bit vs. 64 bit machine, etc.? I.e., what does the preprocessor's behavior depend on?

唯一的直接依赖是 intmax_tuintmax_t 的实现定义。这些与语言选择或机器体系结构没有直接关系,尽管可能存在 相关性

  1. Where is all that specified/documented?

当然是在各自语言的语言规范中。我已经引用了 C11 规范中两个更相关的部分,并将您链接到该标准的最新草案。 (目前的 C 是 C18,但在这些方面都没有改变。)

TLDR 精简版:

lll 有效地(不是字面意思!)被预处理器条件忽略(基本上,一切都被视为有一个 ll 后缀),但是 u 被考虑(通常,对于每个 C 整数常量)!

在阅读了所有出色的答案后,我创建了更多示例来揭示一些预期但有趣的行为:

#include <stdio.h>

int main()
{
#if (1 - 2u > 0) // If one operand is unsigned, the result is unsigned.
                 // Usual implicit type conversion.
  printf("1 - 2u > 0\n");
#endif

#if (0 < 0xFFFFFFFFFFFFFFFF)
  printf("0 < 0xFFFFFFFFFFFFFFFF\n");
#endif

#if (-1 < 0)
  printf("-1 < 0\n");
#endif

#if (-1 < 0xFFFFFFFFFFFFFFFF)
  printf("-1 < 0xFFFFFFFFFFFFFFFF\n"); // nope
#elif (-1 > 0xFFFFFFFFFFFFFFFF)
  printf("-1 > 0xFFFFFFFFFFFFFFFF\n"); // nope, obviously
#endif

#if (-1 == 0xFFFFFFFFFFFFFFFF)
  printf("-1 == 0xFFFFFFFFFFFFFFFF (!!!)\n");
#endif
}

有了这个输出:

1 - 2u > 0
0 < 0xFFFFFFFFFFFFFFFF
-1 < 0
-1 == 0xFFFFFFFFFFFFFFFF (!!!)