在图形并行计算中处理相互依赖的文件
Dealing with interdependent files in graph-parallel computation
我正在尝试并行化以下代码(MCVE) by creating a task graph using dask.delayed
(or by implementing a computational graph 我自己):
os.chdir('./kitchen1')
write_dough() # writes file ./dough
write_topping() # writes file ./topping
write_pizza() # requires ./dough and ./topping; writes ./pizza
我看到 2 个困难:
write_dough
没有 return 任何东西。 z=x+y
让变量之间的依赖关系一目了然;这不是。 Dask doesn’t recommend relying on side effects。有惯用的解决方案吗?
os.chdir
。如何将其合并到计算图中?
- 我不关心并行化文件 IO、性能等
这是我目前的解决方案。它增加了复杂性,并且 './kitchen1'
无处不在,这很丑陋。优雅的解决方案是什么?
write_dough, write_topping, write_pizza = map(dask.delayed, (write_dough, write_topping, write_pizza))
dough = write_dough('./kitchen1')
topping = write_topping('./kitchen1')
pizza = write_pizza(dough, topping, './kitchen1')
我会推荐您当前显式传递依赖项的方法。
我正在尝试并行化以下代码(MCVE) by creating a task graph using dask.delayed
(or by implementing a computational graph 我自己):
os.chdir('./kitchen1')
write_dough() # writes file ./dough
write_topping() # writes file ./topping
write_pizza() # requires ./dough and ./topping; writes ./pizza
我看到 2 个困难:
write_dough
没有 return 任何东西。z=x+y
让变量之间的依赖关系一目了然;这不是。 Dask doesn’t recommend relying on side effects。有惯用的解决方案吗?os.chdir
。如何将其合并到计算图中?- 我不关心并行化文件 IO、性能等
这是我目前的解决方案。它增加了复杂性,并且 './kitchen1'
无处不在,这很丑陋。优雅的解决方案是什么?
write_dough, write_topping, write_pizza = map(dask.delayed, (write_dough, write_topping, write_pizza))
dough = write_dough('./kitchen1')
topping = write_topping('./kitchen1')
pizza = write_pizza(dough, topping, './kitchen1')
我会推荐您当前显式传递依赖项的方法。