无法调用数据类中定义的变量

Unable to call variable defined in dataclass

我有一个数据class如下:

from dataclasses import dataclass, field
from typing import Any, Dict

raw_dir = r"C:..." # path of the raw dir
processed_dir = r"C:..." # path of the processed dir

@dataclass
class Files:
    raw_path: Path = Path(raw_dir)
    processed_path: Path = Path(processed_dir)

    path_dict: Dict[str, Any] = field(
        default_factory=lambda: {
            "raw_train_file": Path(raw_path, "raw_train.csv"),
            "processed_train_file": Path(processed_path, "processed_train.csv"),
        }
    )
Files().path_dict

这将引发错误 name "raw_path" is not defined. 但是当您尝试在第一行之后立即打印 raw_path 时,它可以完成,因此问题可能来自 path_dict。我尝试将键值对替换为 "raw": Path(directory) 并且它起作用了,所以我认为这不是数据类型的问题。


上下文:我将 dataclass 视为 config 文件(func),这样当我需要调用默认路径时,我可以使用:

pd.read_csv(Files().path_dict["raw_train_file"])

您的问题是 default_factory 必须是零参数可调用对象。因此,它不能使用任何成员变量。在这里,由于成员变量有简单的初始化,你可以重复初始化,只使用全局变量:

...
path_dict: Dict[str, Any] = field(
    default_factory=lambda: {
        "raw_train_file": Path(Path(raw_dir), "raw_train.csv"),
        "processed_train_file": Path(Path(processed_dir), "processed_train.csv"),
    }

但是你也可以使用特殊的__post_init__方法,在其他初始化之后由生成的__init__调用。当它接收到 self 参数时,它可以使用成员变量:

@dataclass
class Files:
    raw_path: Path = Path(raw_dir)
    processed_path: Path = Path(processed_dir)

    def __post_init__(self):
        self.path_dict: Dict[str, Any] = {
            "raw_train_file": Path(self.raw_path, "raw_train.csv"),
            "processed_train_file": Path(self.processed_path, "processed_train.csv"),
        }

另一种选择是使用 functools.cached_property 以避免需要在数据类中定义 __post_init__ 方法。请注意,我不建议以任何方式将此作为“更好”的解决方案,只是实现相同目标的另一种方式。

from dataclasses import dataclass
from functools import cached_property
from pathlib import Path
from typing import Any, Dict

raw_dir = r"C:..."  # path of the raw dir
processed_dir = r"C:..."  # path of the processed dir


@dataclass
class Files:
    raw_path: Path = Path(raw_dir)
    processed_path: Path = Path(processed_dir)

    @cached_property
    def path_dict(self) -> Dict[str, Any]:
        return {
            "raw_train_file": Path(self.raw_path, "raw_train.csv"),
            "processed_train_file": Path(self.processed_path, "processed_train.csv"),
        }


print(Files().path_dict)

输出:

{'raw_train_file': PosixPath('C:.../raw_train.csv'), 'processed_train_file': PosixPath('C:.../processed_train.csv')}