如何使用 neuraxle 实现延迟数据加载的存储库?
How to implement a repository for lazy data loading with neuraxle?
neuraxle documentation中有一个示例,使用存储库在管道中延迟加载数据,请参见以下代码:
from neuraxle.pipeline import Pipeline, MiniBatchSequentialPipeline
from neuraxle.base import ExecutionContext
from neuraxle.steps.column_transformer import ColumnTransformer
from neuraxle.steps.flow import TrainOnlyWrapper
training_data_ids = training_data_repository.get_all_ids()
context = ExecutionContext('caching_folder').set_service_locator({
BaseRepository: training_data_repository
})
pipeline = Pipeline([
ConvertIDsToLoadedData().assert_has_services(BaseRepository),
ColumnTransformer([
(range(0, 2), DateToCosineEncoder()),
(3, CategoricalEnum(categeories_count=5, starts_at_zero=True)),
]),
Normalizer(),
TrainOnlyWrapper(DataShuffler()),
MiniBatchSequentialPipeline([
Model()
], batch_size=128)
]).with_context(context)
但是,它没有显示,如何实现BaseRepository
和ConvertIDsToLoadedData
类。实施这些 类 的最佳方式是什么?谁能举个例子?
下面的编译是不是我没有检查,但是应该是这样的。如果您发现要更改的内容并尝试编译它,请有人编辑此答案:
class BaseDataRepository(ABC):
@abstractmethod
def get_all_ids(self) -> List[int]:
pass
@abstractmethod
def get_data_from_id(self, _id: int) -> object:
pass
class InMemoryDataRepository(BaseDataRepository):
def __init__(self, ids, data):
self.ids: List[int] = ids
self.data: Dict[int, object] = data
def get_all_ids(self) -> List[int]:
return list(self.ids)
def get_data_from_id(self, _id: int) -> object:
return self.data[_id]
class ConvertIDsToLoadedData(BaseStep):
def _transform_data_container(self, data_container: DataContainer, context: ExecutionContext):
repo: BaseDataRepository = context.get_service(BaseDataRepository)
ids = data_container.data_inputs
# Replace data ids by their loaded object counterpart:
data_container.data_inputs = [repo.get_data_from_id(_id) for _id in ids]
return data_container, context
context = ExecutionContext('caching_folder').set_service_locator({
BaseDataRepository: InMemoryDataRepository(ids, data) # or insert here any other replacement class that inherits from `BaseDataRepository` when you'll change the database to a real one (e.g.: SQL) rather than a cheap "InMemory" stub.
})
有关更新,请参阅我在此处针对此问题打开的问题:https://github.com/Neuraxio/Neuraxle/issues/421
neuraxle documentation中有一个示例,使用存储库在管道中延迟加载数据,请参见以下代码:
from neuraxle.pipeline import Pipeline, MiniBatchSequentialPipeline
from neuraxle.base import ExecutionContext
from neuraxle.steps.column_transformer import ColumnTransformer
from neuraxle.steps.flow import TrainOnlyWrapper
training_data_ids = training_data_repository.get_all_ids()
context = ExecutionContext('caching_folder').set_service_locator({
BaseRepository: training_data_repository
})
pipeline = Pipeline([
ConvertIDsToLoadedData().assert_has_services(BaseRepository),
ColumnTransformer([
(range(0, 2), DateToCosineEncoder()),
(3, CategoricalEnum(categeories_count=5, starts_at_zero=True)),
]),
Normalizer(),
TrainOnlyWrapper(DataShuffler()),
MiniBatchSequentialPipeline([
Model()
], batch_size=128)
]).with_context(context)
但是,它没有显示,如何实现BaseRepository
和ConvertIDsToLoadedData
类。实施这些 类 的最佳方式是什么?谁能举个例子?
下面的编译是不是我没有检查,但是应该是这样的。如果您发现要更改的内容并尝试编译它,请有人编辑此答案:
class BaseDataRepository(ABC):
@abstractmethod
def get_all_ids(self) -> List[int]:
pass
@abstractmethod
def get_data_from_id(self, _id: int) -> object:
pass
class InMemoryDataRepository(BaseDataRepository):
def __init__(self, ids, data):
self.ids: List[int] = ids
self.data: Dict[int, object] = data
def get_all_ids(self) -> List[int]:
return list(self.ids)
def get_data_from_id(self, _id: int) -> object:
return self.data[_id]
class ConvertIDsToLoadedData(BaseStep):
def _transform_data_container(self, data_container: DataContainer, context: ExecutionContext):
repo: BaseDataRepository = context.get_service(BaseDataRepository)
ids = data_container.data_inputs
# Replace data ids by their loaded object counterpart:
data_container.data_inputs = [repo.get_data_from_id(_id) for _id in ids]
return data_container, context
context = ExecutionContext('caching_folder').set_service_locator({
BaseDataRepository: InMemoryDataRepository(ids, data) # or insert here any other replacement class that inherits from `BaseDataRepository` when you'll change the database to a real one (e.g.: SQL) rather than a cheap "InMemory" stub.
})
有关更新,请参阅我在此处针对此问题打开的问题:https://github.com/Neuraxio/Neuraxle/issues/421