如何保留 PBSCluster 运行？

Question

我可以访问集群运行 PBS Pro，并希望在头节点上保留一个 PBSCluster 实例运行。我当前（显然已损坏）的脚本是：

import dask_jobqueue

from paths import get_temp_dir


def main():
    temp_dir = get_temp_dir()
    scheduler_options = {'scheduler_file': temp_dir / 'scheduler.json'}
    cluster = dask_jobqueue.PBSCluster(cores=24, memory='100GB', processes=1, scheduler_options=scheduler_options)


if __name__ == '__main__':
    main()

这个脚本显然是错误的，因为在创建集群后 main() 函数退出并且集群被销毁。我想我必须调用某种 execute_io_loop 函数，但我在 API.

中找不到任何东西

那么，如何让我的 PBSCluster 保持活动状态？

Answer 1

我认为文档中 Python API (advanced) 的部分可能是尝试解决此问题的好方法。

请注意，这是一个关于如何创建调度程序和工作程序的示例，但我假设逻辑可以以类似的方式用于您的案例。

import asyncio

async def create_cluster():
    temp_dir = get_temp_dir()
    scheduler_options = {'scheduler_file': temp_dir / 'scheduler.json'}
    cluster = dask_jobqueue.PBSCluster(cores=24, memory='100GB', processes=1, scheduler_options=scheduler_options)

if __name__ == "__main__":
    asyncio.get_event_loop().run_until_complete(create_cluster())

您可能需要稍微更改一下代码，但它应该会保留您的 create_cluster 运行直到完成。

让我知道这是否适合你。

如何保留 PBSCluster 运行？

How can I keep a PBSCluster running?

dask

dask-distributed

dask-jobqueue