管道在 kedro 中找不到节点
Pipeline can't find nodes in kedro
我正在关注 pipelines tutorial,创建所有需要的文件,用 kedro run --node=preprocessing_data
启动了 kedro,但遇到了这样的错误消息:
ValueError: Pipeline does not contain nodes named ['preprocessing_data'].
如果我 运行 kedro 没有 node
参数,我会收到
kedro.context.context.KedroContextError: Pipeline contains no nodes
文件内容:
src/project/pipelines/data_engineering/nodes.py
def preprocess_data(data: SparkDataSet) -> None:
print(data)
return
src/project/pipelines/data_engineering/pipeline.py
def create_pipeline(**kwargs):
return Pipeline(
[
node(
func=preprocess_data,
inputs="data",
outputs="preprocessed_data",
name="preprocessing_data",
),
]
)
src/project/pipeline.py
def create_pipelines(**kwargs) -> Dict[str, Pipeline]:
de_pipeline = de.create_pipeline()
return {
"de": de_pipeline,
"__default__": Pipeline([])
}
我认为您需要在 __default__
中安装管道。
例如
def create_pipelines(**kwargs) -> Dict[str, Pipeline]:
de_pipeline = de.create_pipeline()
return {
"de": data_engineering_pipeline,
"__default__": data_engineering_pipeline
}
那么 kedro run --node=preprocessing_data
适合我。
Mayurc 是正确的,没有节点,因为您的 __default__
管道是空的。另一种选择是 运行 只是 de
管道与 cli。
kedro run --pipeline de
您可以在 运行 命令的帮助文本中找到此选项及更多内容。
$ kedro run --help
Usage: kedro run [OPTIONS]
Run the pipeline.
Options:
--from-inputs TEXT A list of dataset names which should be used as a
starting point.
--from-nodes TEXT A list of node names which should be used as a
starting point.
--to-nodes TEXT A list of node names which should be used as an
end point.
-n, --node TEXT Run only nodes with specified names.
-r, --runner TEXT Specify a runner that you want to run the pipeline
with.
This option cannot be used together with
--parallel.
-p, --parallel Run the pipeline using the `ParallelRunner`.
If
not specified, use the `SequentialRunner`. This
flag cannot be used together
with --runner.
-e, --env TEXT Run the pipeline in a configured environment. If
not specified,
pipeline will run using environment
`local`.
-t, --tag TEXT Construct the pipeline using only nodes which have
this tag
attached. Option can be used multiple
times, what results in a
pipeline constructed from
nodes having any of those tags.
-lv, --load-version TEXT Specify a particular dataset version (timestamp)
for loading.
--pipeline TEXT Name of the modular pipeline to run.
If not set,
the project pipeline is run by default.
-c, --config FILE Specify a YAML configuration file to load the run
command arguments from. If command line arguments
are provided, they will
override the loaded ones.
--params TEXT Specify extra parameters that you want to pass
to
the context initializer. Items must be separated
by comma, keys - by colon,
example:
param1:value1,param2:value2. Each parameter is
split by the first comma,
so parameter values are
allowed to contain colons, parameter keys are not.
-h, --help Show this message and exit.
发布了第二个答案,因为完整的帮助输出不适合评论。
我正在关注 pipelines tutorial,创建所有需要的文件,用 kedro run --node=preprocessing_data
启动了 kedro,但遇到了这样的错误消息:
ValueError: Pipeline does not contain nodes named ['preprocessing_data'].
如果我 运行 kedro 没有 node
参数,我会收到
kedro.context.context.KedroContextError: Pipeline contains no nodes
文件内容:
src/project/pipelines/data_engineering/nodes.py
def preprocess_data(data: SparkDataSet) -> None:
print(data)
return
src/project/pipelines/data_engineering/pipeline.py
def create_pipeline(**kwargs):
return Pipeline(
[
node(
func=preprocess_data,
inputs="data",
outputs="preprocessed_data",
name="preprocessing_data",
),
]
)
src/project/pipeline.py
def create_pipelines(**kwargs) -> Dict[str, Pipeline]:
de_pipeline = de.create_pipeline()
return {
"de": de_pipeline,
"__default__": Pipeline([])
}
我认为您需要在 __default__
中安装管道。
例如
def create_pipelines(**kwargs) -> Dict[str, Pipeline]:
de_pipeline = de.create_pipeline()
return {
"de": data_engineering_pipeline,
"__default__": data_engineering_pipeline
}
那么 kedro run --node=preprocessing_data
适合我。
Mayurc 是正确的,没有节点,因为您的 __default__
管道是空的。另一种选择是 运行 只是 de
管道与 cli。
kedro run --pipeline de
您可以在 运行 命令的帮助文本中找到此选项及更多内容。
$ kedro run --help
Usage: kedro run [OPTIONS]
Run the pipeline.
Options:
--from-inputs TEXT A list of dataset names which should be used as a
starting point.
--from-nodes TEXT A list of node names which should be used as a
starting point.
--to-nodes TEXT A list of node names which should be used as an
end point.
-n, --node TEXT Run only nodes with specified names.
-r, --runner TEXT Specify a runner that you want to run the pipeline
with.
This option cannot be used together with
--parallel.
-p, --parallel Run the pipeline using the `ParallelRunner`.
If
not specified, use the `SequentialRunner`. This
flag cannot be used together
with --runner.
-e, --env TEXT Run the pipeline in a configured environment. If
not specified,
pipeline will run using environment
`local`.
-t, --tag TEXT Construct the pipeline using only nodes which have
this tag
attached. Option can be used multiple
times, what results in a
pipeline constructed from
nodes having any of those tags.
-lv, --load-version TEXT Specify a particular dataset version (timestamp)
for loading.
--pipeline TEXT Name of the modular pipeline to run.
If not set,
the project pipeline is run by default.
-c, --config FILE Specify a YAML configuration file to load the run
command arguments from. If command line arguments
are provided, they will
override the loaded ones.
--params TEXT Specify extra parameters that you want to pass
to
the context initializer. Items must be separated
by comma, keys - by colon,
example:
param1:value1,param2:value2. Each parameter is
split by the first comma,
so parameter values are
allowed to contain colons, parameter keys are not.
-h, --help Show this message and exit.
发布了第二个答案,因为完整的帮助输出不适合评论。