如何有条件地运行部分 Kedro 管道？

Question

我有一个很大的管道，需要几个小时才能运行。它的一小部分需要经常运行，我如何运行它而不触发整个管道？

Answer 1

有多种方法可以指定管道的哪些节点或部分运行。

使用 kedro run 参数，如 --to-nodes/--from-nodes/--node 明确定义需要的内容运行.
在kedro>=0.15.2中可以定义多条管道，然后运行只用kedro run --pipeline <name>定义其中一条。如果未指定 --pipeline 参数，则默认管道为运行。默认管道可能会结合其他几个管道。有关使用模块化管道的更多信息：https://kedro.readthedocs.io/en/latest/04_user_guide/06_pipelines.html#modular-pipelines
使用标签。用 "small" 之类的东西标记你的管道的一小部分，然后做 kedro run --tag small。在这里阅读更多：https://kedro.readthedocs.io/en/latest/04_user_guide/05_nodes.html#tagging-nodes

Answer 2

我建议按照@idanov 的建议，从 cli 中将您的标签或管道正确设置为运行。在漫长的运行转向生产过程中，这对您来说会容易得多。我还要补充一点，您可以在 python 内部进行大量临时管道修剪和运行ning，这里有一些示例。

按标签筛选

nodes = pipeline.only_nodes_with_tags('cars')

按节点过滤

nodes = pipeline.only_nodes('b_int_cars')

过滤节点喜欢

query_string = 'cars'
nodes = [
   node.name 
   for node in pipeline.nodes 
   if query_string in node.name
   ]
pipeline.only_nodes(*nodes)

只有带标签的节点 或

nodes = pipeline.only_nodes_with_tags('cars', 'trains')

只有带标签的节点 和

raw_nodes = pipeline.only_nodes_with_tags('raw')
car_nodes = pipeline.only_nodes_with_tags('cars')
raw_car_nodes = raw_nodes & car_nodes

raw_nodes = (
   pipeline
   .only_nodes_with_tags('raw')
   .only_nodes_with_tags('cars')
   )

添加管道

car_nodes = pipeline.only_nodes_with_tags('cars')
train_nodes = pipeline.only_nodes_with_tags('trains')
transportation_nodes = car_nodes + train_nodes

以上是我个人的摘录kedro notes。

如何有条件地 运行 部分 Kedro 管道？

How to run parts of your Kedro pipeline conditionally?

python

pipeline

kedro

如何有条件地运行部分 Kedro 管道？