dask worker 存储结果或文件的默认目录是什么?
what is the default directory where dask workers store results or files.?
[mapr@impetus-i0057 latest_code_deepak]$ dask-worker 172.26.32.37:8786
distributed.nanny - INFO - Start Nanny at: 'tcp://172.26.32.36:50930'
distributed.diskutils - WARNING - Found stale lock file and directory '/home/mapr/latest_code_deepak/dask-worker-space/worker-PwEseH', purging
distributed.worker - INFO - Start worker at: tcp://172.26.32.36:41694
distributed.worker - INFO - Listening to: tcp://172.26.32.36:41694
distributed.worker - INFO - bokeh at: 172.26.32.36:8789
distributed.worker - INFO - nanny at: 172.26.32.36:50930
distributed.worker - INFO - Waiting to connect to: tcp://172.26.32.37:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Threads: 8
distributed.worker - INFO - Memory: 33.52 GB
distributed.worker - INFO - Local Directory: /home/mapr/latest_code_deepak/dask-worker-spa ce/worker-AkBPtM
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Registered to: tcp://172.26.32.37:8786
distributed.worker - INFO - -------------------------------------------------
dask-worker 维护临时文件的默认目录是什么,例如任务结果,或从客户端使用 upload_file() 方法上传的下载文件。?
例如:-
def my_task_running_on_dask_worker():
//fetch the file from hdfs
// process the file
//store the file back into hdfs
默认情况下,dask worker 在 ./dask-worker-space/worker-#######
中放置一个目录,其中 ######
是该特定 worker 的一些随机字符串。
您可以使用 --local-directory
关键字将此位置更改为 dask-worker
可执行文件。
您在此行中看到的警告
distributed.diskutils - WARNING - Found stale lock file and directory '/home/mapr/latest_code_deepak/dask-worker-space/worker-PwEseH', purging
说一个 Dask 工作人员注意到 另一个 工作人员的目录没有被清理,大概是因为它以某种困难的方式失败了。这个工人正在清理前一个工人留下的space。
编辑
您可以通过查看每个工作人员的日志(他们打印出他们的本地目录)来查看哪个工作人员创建了哪个目录
$ dask-worker localhost:8786
distributed.worker - INFO - Start worker at: tcp://127.0.0.1:36607
...
distributed.worker - INFO - Local Directory: /home/mrocklin/dask-worker-space/worker-ks3mljzt
或以编程方式调用 client.scheduler_info()
>>> client.scheduler_info()
{'address': 'tcp://127.0.0.1:34027',
'id': 'Scheduler-bd88dfdf-e3f7-4b39-8814-beae779248f1',
'services': {'bokeh': 8787},
'type': 'Scheduler',
'workers': {'tcp://127.0.0.1:33143': {'cpu': 7.7,
...
'local_directory': '/home/mrocklin/dask-worker-space/worker-8kvk_l81',
},
...
[mapr@impetus-i0057 latest_code_deepak]$ dask-worker 172.26.32.37:8786
distributed.nanny - INFO - Start Nanny at: 'tcp://172.26.32.36:50930'
distributed.diskutils - WARNING - Found stale lock file and directory '/home/mapr/latest_code_deepak/dask-worker-space/worker-PwEseH', purging
distributed.worker - INFO - Start worker at: tcp://172.26.32.36:41694
distributed.worker - INFO - Listening to: tcp://172.26.32.36:41694
distributed.worker - INFO - bokeh at: 172.26.32.36:8789
distributed.worker - INFO - nanny at: 172.26.32.36:50930
distributed.worker - INFO - Waiting to connect to: tcp://172.26.32.37:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Threads: 8
distributed.worker - INFO - Memory: 33.52 GB
distributed.worker - INFO - Local Directory: /home/mapr/latest_code_deepak/dask-worker-spa ce/worker-AkBPtM
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Registered to: tcp://172.26.32.37:8786
distributed.worker - INFO - -------------------------------------------------
dask-worker 维护临时文件的默认目录是什么,例如任务结果,或从客户端使用 upload_file() 方法上传的下载文件。?
例如:-
def my_task_running_on_dask_worker():
//fetch the file from hdfs
// process the file
//store the file back into hdfs
默认情况下,dask worker 在 ./dask-worker-space/worker-#######
中放置一个目录,其中 ######
是该特定 worker 的一些随机字符串。
您可以使用 --local-directory
关键字将此位置更改为 dask-worker
可执行文件。
您在此行中看到的警告
distributed.diskutils - WARNING - Found stale lock file and directory '/home/mapr/latest_code_deepak/dask-worker-space/worker-PwEseH', purging
说一个 Dask 工作人员注意到 另一个 工作人员的目录没有被清理,大概是因为它以某种困难的方式失败了。这个工人正在清理前一个工人留下的space。
编辑
您可以通过查看每个工作人员的日志(他们打印出他们的本地目录)来查看哪个工作人员创建了哪个目录
$ dask-worker localhost:8786
distributed.worker - INFO - Start worker at: tcp://127.0.0.1:36607
...
distributed.worker - INFO - Local Directory: /home/mrocklin/dask-worker-space/worker-ks3mljzt
或以编程方式调用 client.scheduler_info()
>>> client.scheduler_info()
{'address': 'tcp://127.0.0.1:34027',
'id': 'Scheduler-bd88dfdf-e3f7-4b39-8814-beae779248f1',
'services': {'bokeh': 8787},
'type': 'Scheduler',
'workers': {'tcp://127.0.0.1:33143': {'cpu': 7.7,
...
'local_directory': '/home/mrocklin/dask-worker-space/worker-8kvk_l81',
},
...