用类型值注释 dataclass class 变量

Annotate dataclass class variable with type value

我们有许多数据类代表具有共同祖先的各种结果 Result。然后每个结果使用其自己的 ResultData 子类提供其数据。但是我们很难正确地注释案例。

我们提出了以下解决方案:

from dataclasses import dataclass
from typing import ClassVar, Generic, Optional, Sequence, Type, TypeVar


class ResultData:
    ...


T = TypeVar('T', bound=ResultData)


@dataclass
class Result(Generic[T]):
    _data_cls: ClassVar[Type[T]]
    data: Sequence[T]

    @classmethod
    def parse(cls, ...) -> T:
        self = cls()
        self.data = [self._data_cls.parse(...)]
        return self

class FooResultData(ResultData):
    ...

class FooResult(Result):
    _data_cls = FooResultData

但它最近因 mypy 错误停止工作 ClassVar cannot contain type variables [misc]。它也反对 PEP 526,请参阅我们之前错过的 https://www.python.org/dev/peps/pep-0526/#class-and-instance-variable-annotations

有没有办法正确注释这种情况?

正如评论中所暗示的那样,可以删除 _data_cls 属性,假设它被用于类型提示目的。注释像 class MyClass[Generic[T]) 定义的通用 class 的正确方法是在类型注释中使用 MyClass[MyType]

例如,希望以下内容在 mypy 中有效。我只在 Pycharm 中测试过,它似乎至少可以很好地推断类型。

from dataclasses import dataclass
from functools import cached_property
from typing import Generic, Sequence, TypeVar, Any, Type


T = TypeVar('T', bound='ResultData')


class ResultData:
    ...


@dataclass
class Result(Generic[T]):
    data: Sequence[T]

    @cached_property
    def data_cls(self) -> Type[T]:
        """Get generic type arg to Generic[T] using `__orig_class__` attribute"""
        # noinspection PyUnresolvedReferences
        return self.__orig_class__.__args__[0]

    def parse(self):
        print(self.data_cls)


@dataclass
class FooResultData(ResultData):
    # can be removed
    this_is_a_test: Any = 'testing'


class AnotherResultData(ResultData): ...


# indicates `data` is a list of `FooResultData` objects
FooResult = Result[FooResultData]

# indicates `data` is a list of `AnotherResultData` objects
AnotherResult = Result[AnotherResultData]

f: FooResult = FooResult([FooResultData()])
f.parse()
_ = f.data[0].this_is_a_test  # no warnings

f: AnotherResult = AnotherResult([AnotherResultData()])
f.parse()

输出:

<class '__main__.FooResultData'>
<class '__main__.AnotherResultData'>

当然,这里有证据证明它似乎对我有效:

最后,我只是用基础 class 替换了 _data_cls 注释中的变量,并修复了 subclasses 的注释,如 所述。

缺点是需要在每个子class中定义两次结果class,但我认为它比在[=22=中提取class更清晰].

完整的解决方案:

from dataclasses import dataclass
from typing import ClassVar, Generic, Optional, Sequence, Type, TypeVar


class ResultData:
    ...


T = TypeVar('T', bound=ResultData)


@dataclass
class Result(Generic[T]):
    _data_cls: ClassVar[Type[ResultData]]  # Fixed annotation here
    data: Sequence[T]

    @classmethod
    def parse(cls, ...) -> T:
        self = cls()
        self.data = [self._data_cls.parse(...)]
        return self

class FooResultData(ResultData):
    ...

class FooResult(Result[FooResultData]):  # Fixed annotation here
    _data_cls = FooResultData