CPython 文档中对 "it should be possible to change the value of 1" 的说明

Question

看到这个link：https://docs.python.org/3/c-api/long.html#c.PyLong_FromLong

The current implementation keeps an array of integer objects for all integers between -5 and 256; when you create an int in that range, you actually just get back a reference to the existing object. So, it should be possible to change the value of 1. I suspect the behavior of Python, in this case, is undefined. :-)

在这种情况下，粗体行是什么意思？

Answer 1

由于对象是通过引用返回的，因此如果您更改对象，程序中的所有内容都会更改。

所以以值1为例，你可以把它改成42。这是可能的，因为 C API 允许您在内部访问 Python 解释器；感觉不太可能在 Python 脚本本身中执行此操作（例如不使用 cffi 之类的东西）。

Answer 2

这意味着 Python 中的整数是具有 "value" 字段的实际对象，用于保存整数的值。在 Java 中，你可以这样表达 Python 的整数（当然，省略了很多细节）：

class PyInteger {

    private int value;

    public PyInteger(int val) {
        this.value = val;
    }

    public PyInteger __add__(PyInteger other) {
        return new PyInteger(this.value + other.value);
    }
}

为了不让 Python 整数具有相同的值，它缓存了一些整数，按照以下行：

PyInteger[] cache = {
  new PyInteger(0),
  new PyInteger(1),
  new PyInteger(2),
  ...
}

但是，如果你这样做会发生什么（让我们暂时忽略 value 是私有的）：

PyInteger one = cache[1];  // the PyInteger representing 1
one.value = 3;

突然之间，每次你在程序中使用1，你实际上会返回3，因为代表1的对象的有效值为3 .

的确，您可以在 Python 中做到这一点！即：可以改变Python中整数的有效数值。 this reddit post 中有答案。不过，为了完整起见，我将其复制到此处（原始学分转至 Veedrac）：

import ctypes

def deref(addr, typ):
    return ctypes.cast(addr, ctypes.POINTER(typ))

deref(id(29), ctypes.c_int)[6] = 100
#>>> 

29
#>>> 100

29 ** 0.5
#>>> 10.0

Python 规范本身并没有说明如何在内部存储或表示整数。它也没有说明应该缓存哪些整数，或者根本不应该缓存任何整数。简而言之：Python 规范中没有任何内容定义如果你做这样愚蠢的事情会发生什么 ;-)。

我们甚至可以更进一步...

实际上，上面的字段 value 实际上是一个整数数组，模拟任意大整数值（对于 64 位整数，您只需组合两个 32 位字段，等等）。然而，当整数开始变大并超出标准的 32 位整数时，缓存不再是一个可行的选择。即使您使用字典，比较整数数组的相等性也会带来太多的开销而收益太少。

您实际上可以通过使用 is 比较身份来自己检查：

>>> 3 * 4 is 12
True
>>> 300 * 400 is 120000
False
>>> 300 * 400 == 120000
True

在典型的Python系统中，只有一个对象代表数字12。 120000，另一方面，几乎从未缓存过。因此，在上面，300 * 400 生成一个表示 120000 的新对象，它与为右侧数字创建的对象不同。

为什么这很重要？如果您更改 1 或 29 等小数字的值，它将影响使用该数字的所有计算。您很可能会严重破坏您的系统（直到您重新启动）。但是如果你改变一个大整数的值，影响将是最小的。

将 12 的值更改为 13 意味着 3 * 4 将产生 13。将 120000 的值更改为 130000 的效果要小得多，并且 300 * 400 仍会产生（新的）120000 而不是 130000.

一旦将其他 Python 实现纳入考虑范围，事情就会变得更加难以预测。 MicroPython, for instance, does not have objects for small numbers, but emalutes them on the fly, and PyPy 可能只是优化您的更改。

底线：您修改的数字的确切行为确实未定义，但取决于几个因素和确切的实施。

回答评论中的一个问题：上面Veedrac的代码中6有什么意义？

Python 中的所有对象共享一个共同的内存布局。第一个字段是一个 引用计数器 ，它告诉您当前有多少其他对象正在引用该对象。第二个字段是对对象的 class 或 type 的引用。由于整数没有固定大小，所以第三个字段是数据部分的大小（可以找相关定义here (general objects) and here (integers/longs)）：

struct longObject {
    native_int      ref_counter;  // offset: +0 / +0
    PyObject*       type;         // offset: +1 / +2
    native_int      size;         // offset: +2 / +4
    unsigned short  value[];      // offset: +3 / +6
}

在32位系统上，native_int和PyObject*都占用32位，在64位系统上自然占用64位。因此，如果我们在 64 位系统上以 32 位（使用 ctypes.c_int）访问数据，则整数的实际值将在偏移量 +6 处找到。另一方面，如果将类型更改为 ctypes.c_long，则偏移量为 +3.

因为id(x)在CPythonreturnsx的内存地址，这个其实可以自己查。基于上面的deref函数，我们来做：

>>> deref(id(29), ctypes.c_long)[3]
29
>>> deref(id(29), ctypes.c_long)[1]
10277248
>>> id(int)       # memory address of class "int"
10277248

Answer 3

另一种思考如果在内部“将 1 地址处的值更改为 17”会发生什么情况的另一种方法是打印 range(3) 中的每个元素——您会看到 0、17、2。

CPython 文档中对 "it should be possible to change the value of 1" 的说明

Clarification for "it should be possible to change the value of 1" from the CPython documentation

python

implementation

cpython