Dask : 在"auto" 模式下如何计算内存限制?

Dask : how the memory limit is calculated in "auto" mode?

文档在“自动”模式下显示了以下公式:

$ dask-worker .. --memory-limit=auto # TOTAL_MEMORY * min(1, nthreads / total_nthreads)

我的 CPU 规格:

Architecture:                    x86_64
CPU(s):                          4
On-line CPU(s) list:             0-3
Thread(s) per core:              1
Core(s) per socket:              4
Socket(s):                       1

我的内存规格:

MemTotal:       16282416 kB
MemFree:         1142108 kB
MemAvailable:    9397036 kB

当我触发 dask_worker 命令时,显示以下输出:

distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                          1
distributed.worker - INFO -                Memory:                   3.88 GiB
distributed.worker - INFO - -------------------------------------------------

能否解释一下 3.88 GiB memory 是如何找到的?好像和之前的公式不符

我怀疑 nthreads 指的是有多少线程 这个 特定的工作线程可用于安排任务,而 total_nthreads 指的是可用的线程总数在您的系统上。

dask-worker CLI 命令具有与 LocalCluster 相同的默认值(参见 GitHub issue)。假设 LocalCluster 启动 n workers 的默认值,其中 n 是您系统上可用内核的数量,并为每个 worker 分配 m 线程,其中 m 是每个内核的线程数:

n = 4 # number of cores 
m = 1 # number of threads per core 

TOTAL_MEMORY = 16282416 kB

TOTAL_MEMORY * min(1, 1 / 4)

> 4070604

4070604 kB 为 3.79 GiB

在此处查看文档:

https://docs.dask.org/en/latest/deploying-cli.html#dask-worker

--nthreads

Number of threads per process.

--nprocs

Deprecated. Use ‘–nworkers’ instead. Number of worker processes to launch. If negative, then (CPU_COUNT + 1 + nprocs) is used. Set to ‘auto’ to set nprocs and nthreads dynamically based on CPU_COUNT

--nworkers <n_workers>

Number of worker processes to launch. If negative, then (CPU_COUNT + 1 + nworkers) is used. Set to ‘auto’ to set nworkers and nthreads dynamically based on CPU_COUNT

另请参阅 LocalClustersource 以了解如何设置默认值: