`is` 运算符对非缓存整数的行为异常

Question

在使用 Python 解释器时，我偶然发现了这个关于 is 运算符的冲突案例：

如果计算发生在它的函数中 returns True，如果它在它之外完成 returns False.

>>> def func():
...     a = 1000
...     b = 1000
...     return a is b
...
>>> a = 1000
>>> b = 1000
>>> a is b, func()
(False, True)

由于 is 运算符计算 id() 所涉及的对象，这意味着 a 和 b 指向相同的 int在函数 func 内部声明时的实例，但相反，在函数外部时它们指向不同的对象。

为什么会这样？

^{注意：我知道身份（is）和相等（==）操作之间的区别，如中所述Understanding Python's "is" operator. In addition, I'm also aware about the caching that is being performed by python for the integers in range [-5, 256] as described in "is" operator behaves unexpectedly with integers。

这里不是这种情况，因为数字超出了该范围，我想评估身份，不平等。}

Answer 1

tl;博士：

如 reference manual 所述：

A block is a piece of Python program text that is executed as a unit. The following are blocks: a module, a function body, and a class definition. Each command typed interactively is a block.

这就是为什么在函数的情况下，您有一个 single 代码块，其中包含一个 single 对象用于数字文字 1000，因此 id(a) == id(b) 将产生 True。

在第二种情况下，您有两个不同的代码对象，每个对象都有自己不同的文字对象1000，所以id(a) != id(b).

请注意，此行为不会仅在 int 字面量中表现出来，您将获得类似的结果，例如，float 字面量（请参阅）。

当然，比较对象（显式 is None 测试除外）应该始终使用相等运算符 == 和 not is.

_{此处陈述的所有内容都适用于 Python、CPython 的最流行实现。其他实现可能不同，因此在使用它们时不应做出任何假设。}

更长的答案：

为了获得更清晰的视图并进一步验证这种 看似奇怪的 行为，我们可以直接查看 code objects for each of these cases using the dis 模块。

对于函数func:

与所有其他属性一起，函数对象还有一个 __code__ 属性，允许您查看该函数的已编译字节码。使用 dis.code_info 我们可以获得给定函数的代码对象中所有存储属性的漂亮视图：

>>> print(dis.code_info(func))
Name:              func
Filename:          <stdin>
Argument count:    0
Kw-only arguments: 0
Number of locals:  2
Stack size:        2
Flags:             OPTIMIZED, NEWLOCALS, NOFREE
Constants:
   0: None
   1: 1000
Variable names:
   0: a
   1: b

我们只对函数 func 的 Constants 条目感兴趣。在其中，我们可以看到我们有两个值，None（始终存在）和 1000。我们只有一个 single int 实例代表常量 1000。这是调用函数时 a 和 b 将分配给的值。

通过 func.__code__.co_consts[1] 访问此值很容易，因此，另一种查看函数中 a is b 评估的方法如下：

>>> id(func.__code__.co_consts[1]) == id(func.__code__.co_consts[1])

当然，这将计算为 True，因为我们指的是同一个对象。

对于每个交互命令：

如前所述，每个交互式命令都被解释为单个代码块：独立解析、编译和评估。

我们可以通过compile内置的

获取每个命令的代码对象

>>> com1 = compile("a=1000", filename="", mode="single")
>>> com2 = compile("b=1000", filename="", mode="single")

对于每个赋值语句，我们将得到一个类似的代码对象，如下所示：

>>> print(dis.code_info(com1))
Name:              <module>
Filename:          
Argument count:    0
Kw-only arguments: 0
Number of locals:  0
Stack size:        1
Flags:             NOFREE
Constants:
   0: 1000
   1: None
Names:
   0: a

com2 的相同命令看起来相同，但 有根本区别：每个代码对象 com1 和 com2 都有表示文字 1000 的不同 int 实例。这就是为什么在这种情况下，当我们通过 co_consts 参数执行 a is b 时，我们实际上得到：

>>> id(com1.co_consts[0]) == id(com2.co_consts[0])
False

这与我们实际得到的一致。

不同的代码对象，不同的内容。

注意：我有点好奇这在源代码中究竟是如何发生的，在深入研究之后我相信我终于找到了它。

在编译阶段 co_consts attribute is represented by a dictionary object. In compile.c 我们实际上可以看到初始化：

/* snippet for brevity */

u->u_lineno = 0;
u->u_col_offset = 0;
u->u_lineno_set = 0;
u->u_consts = PyDict_New();  

/* snippet for brevity */

在编译过程中检查是否存在常量。有关更多信息，请参阅。

注意事项：

链式语句将计算为 True
的身份检查
现在应该更清楚为什么下面的计算结果为 True:
```
 >>> a = 1000; b = 1000;
 >>> a is b
```
在这种情况下，通过将两个赋值命令链接在一起，我们告诉解释器将这些编译在一起。与函数对象的情况一样，只会为文字 1000 创建一个对象，从而在评估时产生 True 值。
模块级别的执行再次产生 True：

如前所述，参考手册指出：

... The following are blocks: a module ...

所以同样的前提适用：我们将有一个单一的代码对象（对于模块），因此，为每个不同的文字存储单一的值。
相同的不适用于可变对象：

意味着除非我们显式地初始化同一个可变对象（例如使用a = b = []），否则对象的身份永远不会相等，例如：

    a = []; b = []
    a is b  # always evaluates to False

同样，在 the documentation 中指定：

after a = 1; b = 1, a and b may or may not refer to the same object with the value one, depending on the implementation, but after c = []; d = [], c and d are guaranteed to refer to two different, unique, newly created empty lists.

Answer 2

在交互式提示中，条目是 compiled in a single mode which processes one complete statement at a time. The compiler itself (in Python/compile.c) tracks the constants in a dictionary called u_consts，它将常量对象映射到它的索引。

在compiler_add_o()函数中，您看到在添加新常量（并递增索引）之前，检查dict以查看常量对象和索引是否已经存在。如果是这样，它们将被重用。

简而言之，这意味着一个语句（例如在您的函数定义中）中重复的常量被折叠成一个单例。相反，您的 a = 1000 和 b = 1000 是两个单独的语句，因此不会发生折叠。

FWIW，这只是一个 CPython 实现细节（即不受语言保证）。这就是为什么这里给出的参考是 C 源代码而不是语言规范，后者不对这个主题做出任何保证。

希望您喜欢这篇关于 CPython 底层工作原理的见解:-)

`is` 运算符对非缓存整数的行为异常

The `is` operator behaves unexpectedly with non-cached integers

python

identity

integer

python-3.x

python-internals

tl;博士：

更长的答案：

注意事项：