Dask Locality,如何从本地工作文件中读取?

Dask Locality, how to read from a local worker file?

我试图从每个工作人员那里读取一个唯一的本地文件,但是我在所有工作人员中得到了相同的结果,而不是每个工作人员的唯一结果....有人可以指出我的意思吗做错了吗?

from dask.distributed import Client, progress
c = Client()
c

import dask.dataframe as dd

filename_1='/tmp/1990.csv'
filename_2='/tmp/1991.csv'
filename_3='/tmp/1992.csv'

future_1 = c.submit(dd.read_csv,filename_1 , workers='172.18.0.3')
future_2 = c.submit(dd.read_csv,filename_2 , workers='172.18.0.5')
future_3 = c.submit(dd.read_csv, filename_3 , workers='172.18.0.6')

future_1.result().head()
future_2.result().head()
future_3.result().head()

我会得到相同的结果,而不是每个结果的唯一数据。

您可能想在这里使用 pandas.read_csv 而不是 dask.dataframe.read_csv

https://docs.dask.org/en/latest/delayed-best-practices.html#don-t-call-dask-delayed-on-other-dask-collections