如何运行一个管道除了几个节点？

Question

我想运行不同文件的管道，但其中一些不需要所有已定义的节点。我怎样才能通过它们？

Answer 1

modular pipelines 会帮忙吗？您可以构建两条管道，一条仅包含两个 "optional" 节点，另一条不包含，然后您可以 return 默认管道是两者的总和。像这样：

def create_pipelines(**kwargs):
    two_node_pipeline = Pipeline(node(), node())
    rest_of_pipeline = Pipeline(node(), node(), node(), node())

    return {
        "rest_of_pipeline": rest_of_pipeline,
        "__default__": two_node_pipeline + rest_of_pipeline,
    }

然后你可以 kedro run --pipeline rest_of_pipeline 到运行没有这两个节点的管道，或者 kedro run 到运行有额外两个节点的管道。

否则，我认为如果您修改 kedro_cli 或 ProjectContext 或 run.py，无论它是什么，添加 --except 功能应该相当容易你自己。我可能会考虑这样做...

Kedro 会根据 toposort 自动对节点进行排序，请参阅之前的答案：

Answer 2

要过滤掉管道中的几行，您可以简单地从 python 内部过滤管道列表，我最喜欢的方法是使用列表理解。

按姓名

nodes_to_run = [node for node in pipeline.nodes if 'dont_run_me' not in node.name]
run(nodes_to_run, io)

按标签

nodes_to_run = [node for node in pipeline.nodes if 'dont_run_tag' not in node.tags]
run(nodes_to_run, io)

可以通过绑定到管道节点的任何属性进行过滤，（名称、输入、输出、short_name、标签）

如果您需要在生产中或从命令行以这种方式运行您的管道，您可以使用标签将您的管道标记为运行，或者添加自定义 click.option到 kedro_cli.py 内的 run 函数然后运行当标志为 True.

时此过滤器

备注这假设您将管道作为 pipeline 加载到内存中，并将目录作为 io

加载

Answer 3

您还可以使用 --to-nodes CLI 选项：kedro run --to-nodes node1,node2。在内部这将调用 pipeline.to_nodes("node1", "node2") - method docs。请注意，您仍然需要确定必须运行.

的 "final" 节点列表

如何运行一个管道除了几个节点？

How to run a pipeline except for a few nodes?

python

pipeline

kedro