随机窃取对子初始化器的调用

Question

有一种情况涉及sub-class我想不通。

我正在 classing Random（原因不在重点）。这是我所拥有的基本示例：

import random

class MyRandom(random.Random):
    def __init__(self, x):  # x isn't used here, but it's necessary to show the problem.
        print("Before")
        super().__init__()  # Nothing passed to parent
        print("After")

MyRandom([])

以上代码在运行时出现以下错误（并且不打印“Before”）：

>>> import test
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\_\PycharmProjects\first\test.py", line 11, in <module>
    MyRandom([])
TypeError: unhashable type: 'list'

对我来说，这没有任何意义。不知何故， MyRandom 的参数显然被直接传递给 Random.__init__ 即使我没有传递它，并且列表被视为种子。 "Before" 从不打印，所以显然我的初始化程序从未被调用过。

我想这可能是由于 Random 的父级在 C 中实现，这导致了怪异，但是 list sub-classing 的类似情况并没有't yield an error that ints aren't iterable:

class MyList(list):
    def __init__(self, y):
        print("Before")
        super().__init__() 
        print("After")


r = MyList(2)  # Prints "Before", "After"

我什至不知道如何处理这个问题。我很少 sub-class，更罕见的是我 sub-class 一个内置的，所以我一定是在我的知识上形成了一个漏洞。这不是我期望 sub-classing 工作的方式。如果有人能解释这里发生了什么，我将不胜感激。

Python3.9

Answer 1

我找到了一种将列表传递给 Random 的继承者并在 __init__ 中使用它的方法。

import random
from typing import List


class MyRandom(random.Random):
    internal_list: List

    def __init__(self, x=None):
        if type(x) is list:
            print(f"Access to the list from `__init__`: {MyRandom.internal_list}")
            super().__init__(MyRandom.internal_list[0])
        else:
            super().__init__(x)

    def __new__(cls, x):
        cls.internal_list = x
        return super().__new__(cls)

    def new_method(self):
        print(f"Access to the list from `new_method`: {MyRandom.internal_list}")

r1 = MyRandom([1, 2])
r1.new_method()
print(r1.random())

r2 = MyRandom([3, 4])
r2.new_method()
print(r2.random())

输出：

Access to the list from `__init__`: [1, 2]
Access to the list from `new_method`: [1, 2]
0.13436424411240122
Access to the list from `__init__`: [3, 4]
Access to the list from `new_method`: [3, 4]
0.23796462709189137

例如，我使用 MyRandom.internal_list[0] 来初始化 PRNG。当然需要检查第一个元素是否存在

我不确定为什么在初始化 MyRandom 时使用 __new__。它绝对没有记录，因为在 PyCharm 实现中我发现了这个：

    @staticmethod # known case of __new__
    def __new__(*args, **kwargs): # real signature unknown
        """ Create and return a new object.  See help(type) for accurate signature. """
        pass

Answer 2

所以已经展示了如何 work-around 这个问题，但这让我很好奇 为什么会发生这种情况。我无法得到明确的答案，但将我的发现张贴在这里供任何想要跟进的人使用。

所以我们知道当创建一个新实例时，首先调用 __new__ - 创建实际实例（在 C-level 上分配内存）。然后将新创建的实例传递给class'__init__方法。

现在，由于“Before”的打印甚至没有发生，因此可以安全地假设问题出在 __new__ 方法中。确实，当我重写它时：

def __new__(cls, *args, **kwargs):
    print("in new")
    return super().__new__(cls)

没有出现错误，预期 print-out 为：

in new
Before
After

将 *args 添加到 super 调用后：

return super().__new__(cls, *args)

同样的错误又回来了。所以这一定是 Random 的 __new__.

中的一个问题

使用 Pycharm 检查代码，Random 不会覆盖其 __new__ 方法，但 class 签名是：

class Random(_random.Random):

试图检查这个父 class 显示了一堆只包含 pass 的方法。这看起来很奇怪，但经过快速搜索后，我发现（对某些人来说这可能并不奇怪）. And _random's C implementation is _randommodule.c.

现在，我没有太多检查 Python 的 C 实现的知识或经验，但我发现似乎是 the basic slots of the Random class:

static PyType_Slot Random_Type_slots[] = {
    {Py_tp_doc, (void *)random_doc},
    {Py_tp_methods, random_methods},
    {Py_tp_new, PyType_GenericNew},
    {Py_tp_init, random_init},
    {Py_tp_free, PyObject_Free},
    {0, 0},
};

我自己的理解是class'__init__映射到random_init, and the class' __new__ is mapped to PyType_GenericNew。但正如它的名字所暗示的那样，PyType_GenericNew 只是一个通用的对象创建者，为对象分配必要的内存量。它的主体是单行：

PyType_GenericNew(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
    return type->tp_alloc(type, 0);
}

连args都没用过

另一方面，random_init 函数调用 random_seed which does have some hashing in it:

Py_hash_t hash = PyObject_Hash(arg);

但话又说回来，我们确定 __init__ 甚至还没有被调用，此时我很困惑...

Answer 3

实例化 class 会调用其 __new__ 方法。它在构造函数调用中传递 class 和参数的名称。因此 MyRandom([1, 2]) 导致调用 MyRandom.__new__(MyRandom, [1, 2])。 (3.9.10 documentation).

因为没有 MyRandom.__new__() 方法，所以搜索基础 classes。 random.Random 确实有一个 __new__() 方法（参见 _randommodule.c 中的 random_new()）。所以我们接到这样的电话 random_new(MyRandom, [1, 2]).

查看 random_new() 的 C 代码，它调用 random_seed(self, [1, 2])。因为第二个参数不是 Null、None、int 或 int 的 subclass，代码调用 PyObject_Hash([1, 2])。但是列表不可散列，因此会出现错误。

如果 __new__() returns 是 class 的一个实例，那么 __init__() 方法将使用构造函数调用中的参数进行调用。

一个可能的修复方法是定义一个 MyRandom.__new__() 方法，该方法调用 super().__new__() 但只传递适当的参数。

class MyRandom(random.Random):
    def __new__(cls, *args, **kwargs):
        #print(f"In __new__: {args=}, {kwargs=}")

        # Random.__new__ expects an optional seed. We are going to 
        # implement out own RNG, so ignore args and kwargs. Pass in a 
        # junk integer value so that Random.__new__ doesn't waste time
        # trying to access urandom or calling time to initialize the MT RNG
        # since we aren't going to use it anyway.
        return super().__new__(cls, 123)
    
    def __init__(cls, *args, **kwargs):
        #print(f"In __init__: {args=}, {kwargs=}")

        # initialize your custom RNG here
        pass

同时覆盖方法：random()、seed()、getstate()、setstate() 和可选的 getrandbits().

另一种修复方法是仅在子classes 的__init__() 方法中使用关键字参数。 random_new() 的 C 代码检查是否正在创建 random.Random 的实例。如果为真，则如果有任何关键字参数，代码将抛出错误。但是，如果正在创建 subclass，任何关键字参数都会被 random_new() 忽略，但可以在 subclass __init__().

中使用

class MyRandom(random.Random):
    def __init__(self, *, x):  # make x a keyword only argument
        print("Before")
        super().__init__()  # Nothing passed to parent
        print("After")

MyRandom(x=[])

有趣的是，在 Python 3.10 中，random_new 的代码已更改为在提供超过 1 个 positional 参数时引发错误。

随机窃取对子初始化器的调用

Random stealing calls to child initializer

python

subclass