将某些气流任务集指定为 运行 在其他任务之前(顺序不变)的方法?
Way to designate certain set of airflow tasks to run before others (order invariant)?
气流 (v1.10.5) dag 看起来像...
有没有办法指定所有蓝色任务应该在调度程序继续执行任何下游任务之前完成(因为当前调度程序有时会在执行下一个蓝色任务之前执行整个任务分支)?
希望避免只是按顺序排列它们(并使用触发规则 TriggerRule.ALL_DONE
),因为它们实际上没有任何需要完成的逻辑顺序(除此之外它们都需要在任何分支中的任何其他下游任务之前完成)。
任何人都知道有什么方法可以做到这一点(比如某种 "priority" 任务池)?其他解决方法建议?
在 airflow mailing list 上问了这个问题,这是结果...
white
blue = [blue_a, blue_b, blue_c]
green = [green_a, green_b, green_c]
yellow = [yellow_a, yellow_b]
cross_downstream(from_tasks=[white], to_tasks=[blue])
cross_downstream(from_tasks=blue, to_tasks=green)
cross_downstream(from_tasks=green to_tasks=yellow)
This should create the required network of dependencies between tasks.
Here is visualization available:
https://imgur.com/a/2jqyqQO
This is the easiest solution and in my opinion the correct one.
However, if you don't want a dependencies then you can create a new
schedule rule by editing the BaseOperator.deps property.
可以在此处找到此辅助 dag 构建函数的文档:https://airflow.apache.org/docs/stable/concepts.html#relationship-helper
这是一个有用的解决方案,但是...
One thing about my case is that the next tasks (greens) in each branch should only run if the blue task in that same branch completes successfully (should not care about the success/failure status of the other blue tasks, only that they have been run). Thus I don't think the ALL_DONE trigger rule will help the greens and ALL_SUCCESS would be too strict.
Any ideas for such a thing?
经过深思熟虑,这是我的解决方法...
气流 (v1.10.5) dag 看起来像...
有没有办法指定所有蓝色任务应该在调度程序继续执行任何下游任务之前完成(因为当前调度程序有时会在执行下一个蓝色任务之前执行整个任务分支)?
希望避免只是按顺序排列它们(并使用触发规则 TriggerRule.ALL_DONE
),因为它们实际上没有任何需要完成的逻辑顺序(除此之外它们都需要在任何分支中的任何其他下游任务之前完成)。
任何人都知道有什么方法可以做到这一点(比如某种 "priority" 任务池)?其他解决方法建议?
在 airflow mailing list 上问了这个问题,这是结果...
white
blue = [blue_a, blue_b, blue_c]
green = [green_a, green_b, green_c]
yellow = [yellow_a, yellow_b]
cross_downstream(from_tasks=[white], to_tasks=[blue])
cross_downstream(from_tasks=blue, to_tasks=green)
cross_downstream(from_tasks=green to_tasks=yellow)
This should create the required network of dependencies between tasks.
Here is visualization available:
https://imgur.com/a/2jqyqQO
This is the easiest solution and in my opinion the correct one.
However, if you don't want a dependencies then you can create a new
schedule rule by editing the BaseOperator.deps property.
可以在此处找到此辅助 dag 构建函数的文档:https://airflow.apache.org/docs/stable/concepts.html#relationship-helper
这是一个有用的解决方案,但是...
One thing about my case is that the next tasks (greens) in each branch should only run if the blue task in that same branch completes successfully (should not care about the success/failure status of the other blue tasks, only that they have been run). Thus I don't think the ALL_DONE trigger rule will help the greens and ALL_SUCCESS would be too strict.
Any ideas for such a thing?
经过深思熟虑,这是我的解决方法...