不当使用__new__生成class个实例?

Improper use of __new__ to generate class instances?

我正在创建一些 classes 来处理各种类型的文件共享(nfs、afp、s3、本地磁盘)等中的文件名。当用户输入时,我得到一个标识数据源的字符串(即 "nfs://192.168.1.3""s3://mybucket/data")等

我正在从具有通用代码的基础 class 中子class 特定文件系统。我感到困惑的地方在于对象的创建。我有以下内容:

import os

class FileSystem(object):
    class NoAccess(Exception):
        pass

    def __new__(cls,path):
        if cls is FileSystem:
            if path.upper().startswith('NFS://'): 
                return super(FileSystem,cls).__new__(Nfs)
            else: 
                return super(FileSystem,cls).__new__(LocalDrive)
        else:
            return super(FileSystem,cls).__new__(cls,path)

    def count_files(self):
        raise NotImplementedError

class Nfs(FileSystem):
    def __init__ (self,path):
        pass

    def count_files(self):
        pass

class LocalDrive(FileSystem):
    def __init__(self,path):
        if not os.access(path, os.R_OK):
            raise FileSystem.NoAccess('Cannot read directory')
        self.path = path

    def count_files(self):
        return len([x for x in os.listdir(self.path) if os.path.isfile(os.path.join(self.path, x))])

data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('/var/log')

print type(data1)
print type(data2)

print data2.count_files()

我认为这将是 __new__ 的一个很好的用途,但我读到的大多数关于它的帖子都不鼓励它。有没有更容易接受的方法来解决这个问题?

在我看来,以这种方式使用 __new__ 会让其他可能阅读您的代码的人感到困惑。此外,它还需要一些 hackish 代码来区分猜测文件系统和用户输入以及创建 NfsLocalDrive 及其相应的 classes。

为什么不针对这种行为创建一个单独的函数?它甚至可以是 FileSystem class:

的静态方法
class FileSystem(object):
    # other code ...

    @staticmethod
    def from_path(path):
        if path.upper().startswith('NFS://'): 
            return Nfs(path)
        else: 
            return LocalDrive(path)

你这样称呼它:

data1 = FileSystem.from_path('nfs://192.168.1.18')
data2 = FileSystem.from_path('/var/log')

认为使用__new__()做你想做的事是不合适的。换句话说,我不同意 accepted answer to this question 声称工厂功能始终是“最好的方法”。

如果你真的想避免使用它,那么唯一的选择是 metaclasses 或单独的 factory function/method (但是请参阅 Python 3.6+ 更新 下面)。考虑到可用的选择,使 __new__() 方法成为一个——因为它默认是静态的——是一种非常明智的方法。

也就是说,下面是我认为是您的代码的改进版本。我添加了几个 class 方法来帮助自动查找所有子 classes。这些支持最重要的改进方式——现在添加 subclasses 不需要修改 __new__() 方法。这意味着它现在可以轻松扩展,因为它有效地支持您所谓的 虚拟构造函数 .

类似的实现也可以用于将实例的创建从 __new__() 方法中移到一个单独的(静态)工厂方法中——所以在某种意义上,所示技术只是一种相对简单的方法编写一个可扩展的通用工厂函数,不管它的名称是什么。

# Works in Python 2 and 3.

import os
import re

class FileSystem(object):
    class NoAccess(Exception): pass
    class Unknown(Exception): pass

    # Regex for matching "xxx://" where x is any non-whitespace character except for ":".
    _PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')

    @classmethod
    def _get_all_subclasses(cls):
        """ Recursive generator of all class' subclasses. """
        for subclass in cls.__subclasses__():
            yield subclass
            for subclass in subclass._get_all_subclasses():
                yield subclass

    @classmethod
    def _get_prefix(cls, s):
        """ Extract any file system prefix at beginning of string s and
            return a lowercase version of it or None when there isn't one.
        """
        match = cls._PATH_PREFIX_PATTERN.match(s)
        return match.group(1).lower() if match else None

    def __new__(cls, path):
        """ Create instance of appropriate subclass using path prefix. """
        path_prefix = cls._get_prefix(path)

        for subclass in cls._get_all_subclasses():
            if subclass.prefix == path_prefix:
                # Using "object" base class method avoids recursion here.
                return object.__new__(subclass)
        else:  # No subclass with matching prefix found (& no default defined)
            raise FileSystem.Unknown(
                'path "{}" has no known file system prefix'.format(path))

    def count_files(self):
        raise NotImplementedError


class Nfs(FileSystem):
    prefix = 'nfs'

    def __init__ (self, path):
        pass

    def count_files(self):
        pass


class LocalDrive(FileSystem):
    prefix = None  # Default when no file system prefix is found.

    def __init__(self, path):
        if not os.access(path, os.R_OK):
            raise FileSystem.NoAccess('Cannot read directory')
        self.path = path

    def count_files(self):
        return sum(os.path.isfile(os.path.join(self.path, filename))
                     for filename in os.listdir(self.path))


if __name__ == '__main__':

    data1 = FileSystem('nfs://192.168.1.18')
    data2 = FileSystem('c:/')  # Change as necessary for testing.

    print(type(data1).__name__)  # -> Nfs
    print(type(data2).__name__)  # -> LocalDrive

    print(data2.count_files())  # -> <some number>

Python 3.6+ 更新

上面的代码在 Python 2 和 3.x 中都有效。但是在 Python 3.6 中,一个新的 class 方法被添加到 object 中,名为 __init_subclass__(),这使得查找 subclass 更简单,因为它可以自动创建一个它们的“注册表”,而不是可能必须像上面的 _get_all_subclasses() 方法那样递归地检查每个子class。

我从 Subclass registration section in the PEP 487 -- Simpler customisation of class creation 提案中得到了使用 __init_subclass__() 来执行此操作的想法。由于该方法会被all基础class'subclasses继承,sub-subclasses会自动注册,也是(与仅直接 subclasses 相反)——它完全消除了对像 _get_all_subclasses().

这样的方法的需要
# Requires Python 3.6+

import os
import re

class FileSystem(object):
    class NoAccess(Exception): pass
    class Unknown(Exception): pass

    # Pattern for matching "xxx://"  # x is any non-whitespace character except for ":".
    _PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
    _registry = {}  # Registered subclasses.

    @classmethod
    def __init_subclass__(cls, /, path_prefix, **kwargs):
        super().__init_subclass__(**kwargs)
        cls._registry[path_prefix] = cls  # Add class to registry.

    @classmethod
    def _get_prefix(cls, s):
        """ Extract any file system prefix at beginning of string s and
            return a lowercase version of it or None when there isn't one.
        """
        match = cls._PATH_PREFIX_PATTERN.match(s)
        return match.group(1).lower() if match else None

    def __new__(cls, path):
        """ Create instance of appropriate subclass. """
        path_prefix = cls._get_prefix(path)
        subclass = cls._registry.get(path_prefix)
        if subclass:
            return object.__new__(subclass)
        else:  # No subclass with matching prefix found (and no default).
            raise cls.Unknown(
                f'path "{path}" has no known file system prefix')

    def count_files(self):
        raise NotImplementedError


class Nfs(FileSystem, path_prefix='nfs'):
    def __init__ (self, path):
        pass

    def count_files(self):
        pass

class Ufs(Nfs, path_prefix='ufs'):
    def __init__ (self, path):
        pass

    def count_files(self):
        pass

class LocalDrive(FileSystem, path_prefix=None):  # Default file system.
    def __init__(self, path):
        if not os.access(path, os.R_OK):
            raise self.NoAccess(f'Cannot read directory {path!r}')
        self.path = path

    def count_files(self):
        return sum(os.path.isfile(os.path.join(self.path, filename))
                     for filename in os.listdir(self.path))


if __name__ == '__main__':

    data1 = FileSystem('nfs://192.168.1.18')
    data2 = FileSystem('c:/')  # Change as necessary for testing.
    data4 = FileSystem('ufs://192.168.1.18')

    print(type(data1))  # -> <class '__main__.Nfs'>
    print(type(data2))  # -> <class '__main__.LocalDrive'>
    print(f'file count: {data2.count_files()}')  # -> file count: <some number>

    try:
        data3 = FileSystem('c:/foobar')  # A non-existent directory.
    except FileSystem.NoAccess as exc:
        print(f'{exc} - FileSystem.NoAccess exception raised as expected')
    else:
        raise RuntimeError("Non-existent path should have raised Exception!")

    try:
        data4 = FileSystem('foobar://42')  # Unregistered path prefix.
    except FileSystem.Unknown as exc:
        print(f'{exc} - FileSystem.Unknown exception raised as expected')
    else:
        raise RuntimeError("Unregistered path prefix should have raised Exception!")

编辑[BLUF]: @martineau 提供的答案没有问题,这个post只是为了跟进完成讨论一个潜在的错误在 class 定义中使用不受 metaclass.

管理的附加关键字时遇到

我想提供一些关于使用 __init_subclass__ 以及使用 __new__ 作为工厂的附加信息。 @martineau posted 的答案非常有用,我已经在我自己的程序中实现了它的一个修改版本,因为我更喜欢使用 class 创建序列而不是将工厂方法添加到名称 space;与 pathlib.Path 的实现方式非常相似。

为了跟进@martinaeu 的评论轨迹,我从他的回答中摘录了以下片段:

import os
import re

class FileSystem(object):
    class NoAccess(Exception): pass
    class Unknown(Exception): pass

    # Regex for matching "xxx://" where x is any non-whitespace character except for ":".
    _PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
    _registry = {}  # Registered subclasses.

    @classmethod
    def __init_subclass__(cls, /, **kwargs):
        path_prefix = kwargs.pop('path_prefix', None)
        super().__init_subclass__(**kwargs)
        cls._registry[path_prefix] = cls  # Add class to registry.

    @classmethod
    def _get_prefix(cls, s):
        """ Extract any file system prefix at beginning of string s and
            return a lowercase version of it or None when there isn't one.
        """
        match = cls._PATH_PREFIX_PATTERN.match(s)
        return match.group(1).lower() if match else None

    def __new__(cls, path):
        """ Create instance of appropriate subclass. """
        path_prefix = cls._get_prefix(path)
        subclass = FileSystem._registry.get(path_prefix)
        if subclass:
            # Using "object" base class method avoids recursion here.
            return object.__new__(subclass)
        else:  # No subclass with matching prefix found (and no default).
            raise FileSystem.Unknown(
                f'path "{path}" has no known file system prefix')

    def count_files(self):
        raise NotImplementedError


class Nfs(FileSystem, path_prefix='nfs'):
    def __init__ (self, path):
        pass

    def count_files(self):
        pass


class LocalDrive(FileSystem, path_prefix=None):  # Default file system.
    def __init__(self, path):
        if not os.access(path, os.R_OK):
            raise FileSystem.NoAccess('Cannot read directory')
        self.path = path

    def count_files(self):
        return sum(os.path.isfile(os.path.join(self.path, filename))
                     for filename in os.listdir(self.path))


if __name__ == '__main__':

    data1 = FileSystem('nfs://192.168.1.18')
    data2 = FileSystem('c:/')  # Change as necessary for testing.

    print(type(data1).__name__)  # -> Nfs
    print(type(data2).__name__)  # -> LocalDrive

    print(data2.count_files())  # -> <some number>

    try:
        data3 = FileSystem('foobar://42')  # Unregistered path prefix.
    except FileSystem.Unknown as exc:
        print(str(exc), '- raised as expected')
    else:
        raise RuntimeError(
              "Unregistered path prefix should have raised Exception!")

这个答案,作为书面作品,但我想解决一些其他人可能因缺乏经验或他们的团队需要的代码库标准而遇到的问题(潜在的陷阱)。

首先,对于 __init_subclass__ 上的装饰器,根据 PEP:

One could require the explicit use of @classmethod on the __init_subclass__ decorator. It was made implicit since there's no sensible interpretation for leaving it out, and that case would need to be detected anyway in order to give a useful error message.

这不是问题,因为它已经暗示了,禅宗告诉我们“显而不是隐”;尽管如此,当遵守 PEP 时,你就可以了(并且进一步解释了理性)。

在我自己实现的类似解决方案中,subclasses 没有使用额外的关键字参数定义,例如@martineau 在这里所做的:

class Nfs(FileSystem, path_prefix='nfs'): ...
class LocalDrive(FileSystem, path_prefix=None): ...

浏览PEP时:

As a second change, the new type.__init__ just ignores keyword arguments. Currently, it insists that no keyword arguments are given. This leads to a (wanted) error if one gives keyword arguments to a class declaration if the metaclass does not process them. Metaclass authors that do want to accept keyword arguments must filter them out by overriding __init__.

为什么这(可能)有问题?好吧,有几个问题(特别是 )描述了围绕 class 定义中的附加关键字参数的问题,使用 metaclasses(随后是 metaclass= 关键字)和已覆盖 __init_subclass__。但是,这并不能解释为什么它在当前给定的解决方案中有效。答案:kwargs.pop().

如果我们看以下内容:

# code in CPython 3.7

import os
import re

class FileSystem(object):
    class NoAccess(Exception): pass
    class Unknown(Exception): pass

    # Regex for matching "xxx://" where x is any non-whitespace character except for ":".
    _PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
    _registry = {}  # Registered subclasses.

    def __init_subclass__(cls, **kwargs):
        path_prefix = kwargs.pop('path_prefix', None)
        super().__init_subclass__(**kwargs)
        cls._registry[path_prefix] = cls  # Add class to registry.

    ...

class Nfs(FileSystem, path_prefix='nfs'): ...

这仍然 运行 没有问题,但是如果我们删除 kwargs.pop():

    def __init_subclass__(cls, **kwargs):
        super().__init_subclass__(**kwargs)  # throws TypeError
        cls._registry[path_prefix] = cls  # Add class to registry.

抛出的错误已知并在 PEP 中进行了描述:

In the new code, it is not __init__ that complains about keyword arguments, but __init_subclass__, whose default implementation takes no arguments. In a classical inheritance scheme using the method resolution order, each __init_subclass__ may take out it's keyword arguments until none are left, which is checked by the default implementation of __init_subclass__.

正在发生的事情是 path_prefix= 关键字正在从 kwargs 中“弹出”,而不仅仅是访问,因此 **kwargs 现在是空的并通过了 MRO,因此符合默认实现(不接收关键字参数)。

为了完全避免这种情况,我建议不要依赖 kwargs,而是使用调用 __init_subclass__ 时已经存在的那个,即 cls 参考:

# code in CPython 3.7

import os
import re

class FileSystem(object):
    class NoAccess(Exception): pass
    class Unknown(Exception): pass

    # Regex for matching "xxx://" where x is any non-whitespace character except for ":".
    _PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
    _registry = {}  # Registered subclasses.

    def __init_subclass__(cls, **kwargs):
        super().__init_subclass__(**kwargs)
        cls._registry[cls._path_prefix] = cls  # Add class to registry.

    ...

class Nfs(FileSystem):
    _path_prefix = 'nfs'

    ...

如果需要引用子class使用的特定前缀(通过self._path_prefix ).据我所知,你不能在定义中引用提供的关键字(没有一些复杂性),这看起来微不足道但很有用。

所以对于@martineau,我很抱歉我的评论看起来很混乱,只有这么多 space 可以打字,而且如图所示更详细。