R 是否在启动时创建了太多线程
is R creating too many threads on startup
每次调用 R 都会创建 63 个子进程
Rscript --vanilla -e 'Sys.sleep(5)' & pstree -p $! | grep -c '{R}'
# 63
其中 pstree
看起来像这样
R(2562809)─┬─{R}(2562818)
├─{R}(2562819)
...
├─{R}(2562878)
├─{R}(2562879)
└─{R}(2562880)
这是预期的行为吗?
这是一台 72 核机器,装有 debian 9.3、R 3.4.3、blas 3.7.0 和 openmp 2.0.2
dpkg-query -l '*blas*' 'r-base' '*lapack*' '*openmp*'|grep ^ii
ii libblas-common 3.7.0-2 amd64 Dependency package for all BLAS implementations
ii libblas-dev 3.7.0-2 amd64 Basic Linear Algebra Subroutines 3, static library
ii libblas3 3.7.0-2 amd64 Basic Linear Algebra Reference implementations, shared library
ii liblapack-dev 3.7.0-2 amd64 Library of linear algebra routines 3 - static version
ii liblapack3 3.7.0-2 amd64 Library of linear algebra routines 3 - shared version
ii libopenblas-base 0.2.19-3 amd64 Optimized BLAS (linear algebra) library (shared library)
ii libopenmpi-dev 2.0.2-2 amd64 high performance message passing library -- header files
ii libopenmpi2:amd64 2.0.2-2 amd64 high performance message passing library -- shared library
ii libopenmpt0:amd64 0.2.7386~beta20.3-3+deb9u2 amd64 module music library based on OpenMPT -- shared library
ii openmpi-bin 2.0.2-2 amd64 high performance message passing library -- binaries
ii openmpi-common 2.0.2-2 all high performance message passing library -- common files
ii r-base 3.4.3-1~stretchcran.0 all GNU R statistical computation and graphics system
R 正在使用 openblas 和 openmp 库
Rscript --vanilla -e 'Sys.sleep(1)' & lsof -p $! |egrep -i 'blas|lapack|parallel|omp'
[1] 2574896
lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing
Output information may be incomplete.
R 2574896 foranw mem REG 0,20 13931603 /usr/lib/libopenblasp-r0.2.19.so (path dev=0,21)
R 2574896 foranw mem REG 0,20 13931604 /usr/lib/openblas-base/libblas.so.3 (path dev=0,21)
R 2574896 foranw mem REG 0,20 13840156 /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0 (path dev=0,21)
设置BLAS/OpemMP环境变量()可以控制分配。我仍然不确定观察到的 'use most of the cores' 默认值是否为 intentional/reasonable
export OPENBLAS_NUM_THREADS=4 OMP_NUM_THREADS=4 MKL_NUM_THREADS=4
Rscript --vanilla -e 'Sys.sleep(1)' & pstree -p $! |wc -l
# 3
R 是(著名的)单核。
我怀疑这来自 libopenblas-base
,它是(也被称为)多核。
将此与我们使用 libblas3
的 rocker 容器对比——单线程,未优化:
> system("pstree")
bash───R───sh───pstree
> system("ps -ax")
PID TTY STAT TIME COMMAND
1 pts/0 Ss 0:00 /bin/bash
579 pts/0 S+ 0:00 /usr/lib/R/bin/exec/R
583 pts/0 S+ 0:00 sh -c ps -ax
584 pts/0 R+ 0:00 ps -ax
>
作为 R 的 Debian 维护者,我利用了我们有多个 BLAS / LAPACK 构建的事实。 Base 可以,OpenBLAS 通常更快(但是当您通过不同的机制从 R 启动多个内核时要小心)并且还有 Atlas。什么是 "best" 总是会得到一个 fimr "it depends".
每次调用 R 都会创建 63 个子进程
Rscript --vanilla -e 'Sys.sleep(5)' & pstree -p $! | grep -c '{R}'
# 63
其中 pstree
看起来像这样
R(2562809)─┬─{R}(2562818)
├─{R}(2562819)
...
├─{R}(2562878)
├─{R}(2562879)
└─{R}(2562880)
这是预期的行为吗?
这是一台 72 核机器,装有 debian 9.3、R 3.4.3、blas 3.7.0 和 openmp 2.0.2
dpkg-query -l '*blas*' 'r-base' '*lapack*' '*openmp*'|grep ^ii
ii libblas-common 3.7.0-2 amd64 Dependency package for all BLAS implementations
ii libblas-dev 3.7.0-2 amd64 Basic Linear Algebra Subroutines 3, static library
ii libblas3 3.7.0-2 amd64 Basic Linear Algebra Reference implementations, shared library
ii liblapack-dev 3.7.0-2 amd64 Library of linear algebra routines 3 - static version
ii liblapack3 3.7.0-2 amd64 Library of linear algebra routines 3 - shared version
ii libopenblas-base 0.2.19-3 amd64 Optimized BLAS (linear algebra) library (shared library)
ii libopenmpi-dev 2.0.2-2 amd64 high performance message passing library -- header files
ii libopenmpi2:amd64 2.0.2-2 amd64 high performance message passing library -- shared library
ii libopenmpt0:amd64 0.2.7386~beta20.3-3+deb9u2 amd64 module music library based on OpenMPT -- shared library
ii openmpi-bin 2.0.2-2 amd64 high performance message passing library -- binaries
ii openmpi-common 2.0.2-2 all high performance message passing library -- common files
ii r-base 3.4.3-1~stretchcran.0 all GNU R statistical computation and graphics system
R 正在使用 openblas 和 openmp 库
Rscript --vanilla -e 'Sys.sleep(1)' & lsof -p $! |egrep -i 'blas|lapack|parallel|omp'
[1] 2574896
lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing
Output information may be incomplete.
R 2574896 foranw mem REG 0,20 13931603 /usr/lib/libopenblasp-r0.2.19.so (path dev=0,21)
R 2574896 foranw mem REG 0,20 13931604 /usr/lib/openblas-base/libblas.so.3 (path dev=0,21)
R 2574896 foranw mem REG 0,20 13840156 /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0 (path dev=0,21)
设置BLAS/OpemMP环境变量(
export OPENBLAS_NUM_THREADS=4 OMP_NUM_THREADS=4 MKL_NUM_THREADS=4
Rscript --vanilla -e 'Sys.sleep(1)' & pstree -p $! |wc -l
# 3
R 是(著名的)单核。
我怀疑这来自 libopenblas-base
,它是(也被称为)多核。
将此与我们使用 libblas3
的 rocker 容器对比——单线程,未优化:
> system("pstree")
bash───R───sh───pstree
> system("ps -ax")
PID TTY STAT TIME COMMAND
1 pts/0 Ss 0:00 /bin/bash
579 pts/0 S+ 0:00 /usr/lib/R/bin/exec/R
583 pts/0 S+ 0:00 sh -c ps -ax
584 pts/0 R+ 0:00 ps -ax
>
作为 R 的 Debian 维护者,我利用了我们有多个 BLAS / LAPACK 构建的事实。 Base 可以,OpenBLAS 通常更快(但是当您通过不同的机制从 R 启动多个内核时要小心)并且还有 Atlas。什么是 "best" 总是会得到一个 fimr "it depends".