如何跨多个调用在分布式集群上保留 dask-DAG 并保留中间结果?
How do I persist dask-DAGs on distributed cluster accross multiple calls and keep intermediate results?
我尝试在 distributed
客户端的多次调用中提交 dask
-DAG,但我无法在集群上保留中间结果。你能指出一下,我该怎么做吗?
from distributed import Client
c = Client()
dsk0 = {'a': 1, 'b': (lambda x: 2*x, 'a')}
keys0 = ['a', 'b']
futures0 = c._graph_to_futures(dsk0, keys0)
fb = futures0['b']
b = fb.result() # Yields correctly 2
dsk1 = {'c': (lambda x: 3*x, 'a')}
keys1 = ['c']
futures1 = c._graph_to_futures(dsk1, keys1)
fc = futures1['c']
c = fc.result() # Yields 'aaa', instead of 3
提前致谢!
马库斯
我推荐使用dask.delayed and the client.compute方法
from dask import delayed
from distributed import Client
client = Client()
a = delayed(1)
b = delayed(lambda x: 2 * x)(a)
a_future, b_future = client.compute([a, b])
>>> b_future.result()
2
c = delayed(lambda x: 3 * x)(a_future)
c_future = client.compute(c)
>>> c_future.result()
3
像_graph_to_futures
这样直接处理图形的内部函数更容易出错,通常供内部使用。
我尝试在 distributed
客户端的多次调用中提交 dask
-DAG,但我无法在集群上保留中间结果。你能指出一下,我该怎么做吗?
from distributed import Client
c = Client()
dsk0 = {'a': 1, 'b': (lambda x: 2*x, 'a')}
keys0 = ['a', 'b']
futures0 = c._graph_to_futures(dsk0, keys0)
fb = futures0['b']
b = fb.result() # Yields correctly 2
dsk1 = {'c': (lambda x: 3*x, 'a')}
keys1 = ['c']
futures1 = c._graph_to_futures(dsk1, keys1)
fc = futures1['c']
c = fc.result() # Yields 'aaa', instead of 3
提前致谢!
马库斯
我推荐使用dask.delayed and the client.compute方法
from dask import delayed
from distributed import Client
client = Client()
a = delayed(1)
b = delayed(lambda x: 2 * x)(a)
a_future, b_future = client.compute([a, b])
>>> b_future.result()
2
c = delayed(lambda x: 3 * x)(a_future)
c_future = client.compute(c)
>>> c_future.result()
3
像_graph_to_futures
这样直接处理图形的内部函数更容易出错,通常供内部使用。