无法调用数据类中定义的变量
Unable to call variable defined in dataclass
我有一个数据class如下:
from dataclasses import dataclass, field
from typing import Any, Dict
raw_dir = r"C:..." # path of the raw dir
processed_dir = r"C:..." # path of the processed dir
@dataclass
class Files:
raw_path: Path = Path(raw_dir)
processed_path: Path = Path(processed_dir)
path_dict: Dict[str, Any] = field(
default_factory=lambda: {
"raw_train_file": Path(raw_path, "raw_train.csv"),
"processed_train_file": Path(processed_path, "processed_train.csv"),
}
)
Files().path_dict
这将引发错误 name "raw_path" is not defined.
但是当您尝试在第一行之后立即打印 raw_path
时,它可以完成,因此问题可能来自 path_dict
。我尝试将键值对替换为 "raw": Path(directory)
并且它起作用了,所以我认为这不是数据类型的问题。
上下文:我将 dataclass
视为 config
文件(func),这样当我需要调用默认路径时,我可以使用:
pd.read_csv(Files().path_dict["raw_train_file"])
您的问题是 default_factory 必须是零参数可调用对象。因此,它不能使用任何成员变量。在这里,由于成员变量有简单的初始化,你可以重复初始化,只使用全局变量:
...
path_dict: Dict[str, Any] = field(
default_factory=lambda: {
"raw_train_file": Path(Path(raw_dir), "raw_train.csv"),
"processed_train_file": Path(Path(processed_dir), "processed_train.csv"),
}
但是你也可以使用特殊的__post_init__
方法,在其他初始化之后由生成的__init__
调用。当它接收到 self
参数时,它可以使用成员变量:
@dataclass
class Files:
raw_path: Path = Path(raw_dir)
processed_path: Path = Path(processed_dir)
def __post_init__(self):
self.path_dict: Dict[str, Any] = {
"raw_train_file": Path(self.raw_path, "raw_train.csv"),
"processed_train_file": Path(self.processed_path, "processed_train.csv"),
}
另一种选择是使用 functools.cached_property
以避免需要在数据类中定义 __post_init__
方法。请注意,我不建议以任何方式将此作为“更好”的解决方案,只是实现相同目标的另一种方式。
from dataclasses import dataclass
from functools import cached_property
from pathlib import Path
from typing import Any, Dict
raw_dir = r"C:..." # path of the raw dir
processed_dir = r"C:..." # path of the processed dir
@dataclass
class Files:
raw_path: Path = Path(raw_dir)
processed_path: Path = Path(processed_dir)
@cached_property
def path_dict(self) -> Dict[str, Any]:
return {
"raw_train_file": Path(self.raw_path, "raw_train.csv"),
"processed_train_file": Path(self.processed_path, "processed_train.csv"),
}
print(Files().path_dict)
输出:
{'raw_train_file': PosixPath('C:.../raw_train.csv'), 'processed_train_file': PosixPath('C:.../processed_train.csv')}
我有一个数据class如下:
from dataclasses import dataclass, field
from typing import Any, Dict
raw_dir = r"C:..." # path of the raw dir
processed_dir = r"C:..." # path of the processed dir
@dataclass
class Files:
raw_path: Path = Path(raw_dir)
processed_path: Path = Path(processed_dir)
path_dict: Dict[str, Any] = field(
default_factory=lambda: {
"raw_train_file": Path(raw_path, "raw_train.csv"),
"processed_train_file": Path(processed_path, "processed_train.csv"),
}
)
Files().path_dict
这将引发错误 name "raw_path" is not defined.
但是当您尝试在第一行之后立即打印 raw_path
时,它可以完成,因此问题可能来自 path_dict
。我尝试将键值对替换为 "raw": Path(directory)
并且它起作用了,所以我认为这不是数据类型的问题。
上下文:我将 dataclass
视为 config
文件(func),这样当我需要调用默认路径时,我可以使用:
pd.read_csv(Files().path_dict["raw_train_file"])
您的问题是 default_factory 必须是零参数可调用对象。因此,它不能使用任何成员变量。在这里,由于成员变量有简单的初始化,你可以重复初始化,只使用全局变量:
...
path_dict: Dict[str, Any] = field(
default_factory=lambda: {
"raw_train_file": Path(Path(raw_dir), "raw_train.csv"),
"processed_train_file": Path(Path(processed_dir), "processed_train.csv"),
}
但是你也可以使用特殊的__post_init__
方法,在其他初始化之后由生成的__init__
调用。当它接收到 self
参数时,它可以使用成员变量:
@dataclass
class Files:
raw_path: Path = Path(raw_dir)
processed_path: Path = Path(processed_dir)
def __post_init__(self):
self.path_dict: Dict[str, Any] = {
"raw_train_file": Path(self.raw_path, "raw_train.csv"),
"processed_train_file": Path(self.processed_path, "processed_train.csv"),
}
另一种选择是使用 functools.cached_property
以避免需要在数据类中定义 __post_init__
方法。请注意,我不建议以任何方式将此作为“更好”的解决方案,只是实现相同目标的另一种方式。
from dataclasses import dataclass
from functools import cached_property
from pathlib import Path
from typing import Any, Dict
raw_dir = r"C:..." # path of the raw dir
processed_dir = r"C:..." # path of the processed dir
@dataclass
class Files:
raw_path: Path = Path(raw_dir)
processed_path: Path = Path(processed_dir)
@cached_property
def path_dict(self) -> Dict[str, Any]:
return {
"raw_train_file": Path(self.raw_path, "raw_train.csv"),
"processed_train_file": Path(self.processed_path, "processed_train.csv"),
}
print(Files().path_dict)
输出:
{'raw_train_file': PosixPath('C:.../raw_train.csv'), 'processed_train_file': PosixPath('C:.../processed_train.csv')}